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Preface 



The mathematical theory and practice of cryptography and coding underpins the 
provision of effective security and reliability for data communication, processing, and 
storage. Theoretical and implementational advances in the fields of cryptography and 
coding are therefore a key factor in facilitating the growth of data communications 
and data networks of various types. Thus, this Eight International Conference in an 
established and successful IMA series on the theme of “Cryptography and Coding” 
was both timely and relevant. The theme of this conference was the future of coding 
and cryptography, which was touched upon in presentations by a number of invited 
speakers and researchers. 

The papers that appear in this book include recent research and development in 
error control coding and cryptography. These start with mathematical bounds, 
statistical decoding schemes for error correcting codes, and undetected error 
probabilities and continue with the theoretical aspects of error correction coding such 
as graph and trellis decoding, multifunctional and multiple access communication 
systems, low density parity check codes, and iterative decoding. These are followed 
by some papers on key recovery attack, authentication, stream cipher design, and 
analysis of ECIES algorithms, and lattice attacks on IP based protocols. 

It is also my pleasant task to place on record my appreciation of the help and 
support of the members of the conference organizing committee, namely Mike 
Darnell, Paddy Farrell, Mick Ganley, John Gordon, Chris Mitchell, Fred Piper, and 
Mike Walker. I wish also to express my sincere thanks to Pamela Bye, Suzanne 
Coleman, and Terry Edwards of the IMA for their help with both the organization of 
the conference and the publication of this volume. Finally, my special thanks go to 
my colleague, Phillip Benachour, for his assistance in editing and preparing the 
camera-ready copies to a very tight schedule. 
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Bahram Honary 
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A Statistical Decoding Algorithm for General 
Linear Block Codes 



A. A1 Jabri 

EE Dept, College of Eng., P.O.Box 800 
King Sand University, Riyadh 11421, Saudi Arabia 
al j abr iSksu . edu . sa 

Abstract. This paper introduces a new decoding algorithm for general 
linear block codes. The algorithm generates a direct estimate of the error 
locations based on exploiting the statistical information embedded in the 
classical syndrome decoding. The algorithm can be used to cryptanalyze 
many algebraic-code public-key crypto and identification systems. In 
particular results show that the McEliece public-key cryptosystem with 
its original parameters is not secure. 

Keywords: Decoding, General Linear Block Codes, McEliece System, 
Statistical. 



1 Introduction 



The problem of decoding an arbitrary linear block codes is known to be NP 
hard. In practice, codes are usually designed with a certain algebraic structure 
that can be exploited to speed up the decoding process. 

For general linear block codes with no obvious structure, one usually uses 
syndrome or probabilistic decoding algorithms [3,2]. These are generally not 
efficient especially for large codes. Randomly generated codes, on the other hand, 
are typically good [5]. In fact, the minimum distance, d, of an (n,k) randomly 
generated code is related to the rate of the code by 





where h{x), 0 < a; < 1 is the binary entropy function defined by a:log 2 (l/a;) -I- 
(1 — x) log 2 (l/(l — x)). For half rate codes, for example, the above relation can 
be approximated by d « Oil n or up to 0.55 n errors can be corrected by these 
codes, for such codes or codes with no obvious structure, one usually decode 
using using syndrome or probabilistic decoding algorithms [2,3]. This later class 
includes the widely used information set decoding algorithm. Generally, these 
algorithms are not computationally and/or storage efficient especially for large 
codes. 

An (n, k) linear block code of length n and k information bits is usually 
characterized by its generator matrix, G, or equivalently by its parity-check 
matrix, H . These matrices are related by the following relationship 
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GH'^ = 0, 

where T denotes the transpose operation. Here we assume binary block codes 
and that the code can correct up to t errors. To encode, the information vector 
u is multiplied by G to obtain the codeword vector c . That is, 

c = uG. 

The received vector y is the codeword c plus a random binary error vector e . 
Or, 

y = c © e. 

For e to be correctable, its Hamming weight must be less than or equal to t. 

In classical syndrome decoding, a table of all possible correctable error vectors 
e and the corresponding syndromes s = eH^ is first constructed. To decode, the 
received vector y is first multiplied by to obtain s and hence e from the 
lookup table. This follows, since 

yi/^ = (c © e)i7^ = eH^ = s 

and the fact that cH^ = 0. 

2 The Statistical Decoding Concept 

Let C and H be the sets of all codewords generated by the code G and its dual 
H, respectively. For any c G C and h. G TL the following holds 

ch^ = 0. 

If yh^ = I, then this particular h is said to have provided odd error detection 
for y . Similarly, if yh^ = 0, then h is said to have provided even error detection 
for y. This later case also includes the no error case. 

Now consider the case when the odd detection process is restricted to those 
elements of H with large weights. Let this subset be denoted by Hw- Note that 
Small weights can be used instead as well. In general, the vector h , in the 
process yh^, acts as a mask or a filter on y. If yh^ = I or equivalently eh* = 1, 
then it is very likely that the h vector components at the error positions will be 
ones. 

If all the h G TLw vectors, satisfying the condition yh^ for the given y , are 
added (in then positions with error will generally have higher frequencies 

than those with no errors. Asymptotic expression for these frequencies will be 
given later. Let the resulting vector be v . That is, 

V = ^ (yh'^)h. 

The operation eh^ = 1 splits TLw into two subsets. Both provide information 
about the error positions. One subset, however, contains the error positions 
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information within the highest m {> t) values of v , while the other has the 
information in the least m values for some m, t < m < n — k. The selection of 
the highest or the lowest depends respectively on whether t is odd or even. The 
m is proposed here to count for the ambiguity in locating the t error positions. 
In fact, due to the statistical nature of the computation process of v , the errors 
will not be confined to the t maximum values but will generally be among a 
larger number of positions; m in this case. This value (threshold) m should be 
set to guarantee that all error patterns are correctable. 

Since weight{e) is not known apriori, the decoder has to assume two scena- 
rios. The first is for odd values of weight{e) and the errors are within the posi- 
tions corresponding to the highest m values of v . The second is to assume even 
weight{e) and that the errors are within the lowest m values of v . Once the 
TO error candidate positions are determined, one can then choose a subvector 
Yk of k bits from the remaining error free positions of y and the corresponding 
submatrix G^, from G and calculate y^G^^, if exists, or try another selection. 
Let Ui and U2 be the corresponding solutions to the two selections. That is, 

— Y ki ^ f 

The decoder can find the correct solution by checking the weight of UiG © y 
and U2G © y then selecting the u that yields a weight less than or equal to t. 
The above is summarized in the following algorithm. 

The Statistical Decoding Algorithm: 

Input: 

"H™ \ y . 

Output u ; an estimate of the the information vector u . 



1. Calculate the error-locating vectors v . 

V = ^ (yh^)h. 



2. Calculate Uj^ and U 2 . 



u' = Yki Gj, z = 1,2. 



3. Check 

weight(UjG © y) z = l,2, 

and choose u^ that yields weight < t and set the result to u . 



^ The set Hw has to be generated and stored in advance. There are many efficient 
algorithms for such generation. (See for example [2,7]). 
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3 Applications 



In what follows we use equivalent codes to some well known codes to test the 
performance of the proposed algorithm. Without loss of generality, we assume, 
in all cases, that G is in the format [/ : P] or H = [P^ : /]. 
a. (7,4,3) Hamming Code: 

This is a trivial case and is presented to clarify the idea. 

Consider a (7, 4) Hamming code. The Parity check matrix of the code is 



P^ 



0 111 
1110 
10 11 



Here the weight distribution is ru 4 = 7 and 7^4 is 



Hi = {( 1011001 ), ( 0101011 ), ( 0010111 ), ( 1110010 ), ( 1001110 ), ( 0111100 ), ( 1100101 )}. 

In this case m = t = 1. The error location in this case corresponds to position 
in V with the highest value. For example a single error at position 2 will result 
in 

V = (2422222). 



b. (23,12,3) Golay Code: 

This code can correct up to 3 errors. For this code 

-1 1 0 0 0 1 1 1 0 1 0 - 

0 1 1 0 0 0 1 1 1 0 1 

11110 110 10 0 
0 11110 110 10 
0 0 11110 110 1 
11011001100 
01101100110 
0 0 1 1 0 1 1 0 0 1 1 

1 1 0 1 1 1 0 0 0 1 1 

10 10 10 0 10 11 
1 0 0 1 0 0 1 1 1 1 1 

.1 0 0 0 1 1 1 0 1 0 1 . 



The dual code has 253 codewords of weight 16. These vectors are sufficient 
to locate all the error patterns of weight 3 or less. The following table shows the 
values of the v components in the error and the error free positions for different 
error weights. 

The number of possible error patterns is + (^ 2 ^) + ( 3 ^) = 2047. For 
syndrome decoding, one needs to store all these vectors and their syndromes. 
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Table 1. Frequency in v 



t 


Error positions 


Correct positions 


I 


176 


120 


2 


56 


80 


3 


96 


88 



Comparing this with the 253 vectors needed in the statistical decoding approach 
indicates the efficiency of the proposed algorithm. Note that for this case m = t. 
This, however, is not generally the case for all codes as shown in the following 
example. 



c. (31,11,5) BCH Codes: 

Consider a code equivalent to the (31,11,5) BCH code obtained by permuting 
the columns of the generator matrix. The P matrix is given below. 



P = 



-1 

0 

1 

0 

0 

0 

0 

1 

1 

1 

.0 



0 110 
10 11 
0 0 11 
10 0 1 
0 10 0 
0 0 10 
0 0 0 1 
0 110 
110 1 
10 0 0 
110 0 



0 0 10 
0 0 0 1 
10 10 
110 1 
1110 
0 111 
0 0 11 
10 11 
0 111 
10 0 1 
0 10 0 



0 110 
0 0 11 
1111 
0 111 
10 11 
0 10 1 
10 10 
10 11 
10 11 
10 11 
110 1 



1 1 0 
0 1 1 
0 1 1 
1 0 1 
1 1 0 
1 1 1 
1 1 1 
1 0 1 
0 0 0 
0 1 0 
1 0 1 



1 0 
0 1 
0 0 
1 0 
1 1 
0 1 
1 0 
0 1 
0 0 
1 0 
0 1 



1 0 - 

0 1 

0 0 

0 0 

0 0 

1 0 

1 1 

1 1 

0 1 

1 0 

0 1 . 



It can be shown that the dual code has 186 codewords of weight 26 and that 
these vectors are sufficient to correct all error patterns of weight 5 or less. For 
this example, m is found to be 10 and is obtained by a computer search. The 
number of error patterns needed in the syndrome table construction is 206367. 
Exhaustive search of the codewords, on the other hand, will require the test 
of 2^^ codewords. In any case, comparing this with 186 vectors needed in the 
proposed algorithm indicates the high efficiency of the proposed algorithm. 

Next we consider a more serious case. 

d. (32, 16, 2) Random code: 

This is a randomly generated (32,16,5) code. It can be shown (exhaustive 
search) that the minimum distance for this code is 5. That is, the code can 
correct 2 random errors. The P matrix is 
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0 1 
0 0 
0 0 
0 1 
1 1 
1 0 
1 0 
1 1 
1 0 
0 1 
0 0 
1 1 
1 1 
1 1 
0 1 
0 1 



1 1 
1 1 
0 1 
1 0 
0 1 
0 1 
1 1 
1 1 
1 1 
1 1 
1 0 
1 0 
1 1 
1 1 
1 0 
1 1 



1 1 
0 1 
0 0 
0 1 
1 0 
0 1 
0 0 
0 0 
1 1 
1 1 
1 1 
0 1 
1 0 
1 1 
1 1 
1 1 



1 0 
0 0 
0 0 
0 1 
0 1 
0 0 
1 0 
1 1 
0 0 
1 1 
0 0 
1 0 
1 1 
0 0 
0 1 
1 1 



1 0 
1 0 
0 1 
1 0 
1 0 
0 1 
0 0 
0 1 
0 1 
1 0 
1 0 
0 0 
1 1 
1 1 
1 1 
1 1 



1 1 
0 1 
0 1 
1 0 
0 1 
1 1 
0 1 
1 1 
1 1 
1 1 
1 1 
1 0 
1 0 
1 1 
1 0 
0 1 



0 0 
1 0 
1 1 
0 1 
1 1 
1 0 
1 0 
0 1 
1 1 
1 0 
1 1 
0 1 
0 1 
0 0 
1 0 
0 1 



1 0 
0 0 
1 1 
1 0 
1 1 
0 1 
0 1 
1 0 
1 1 
1 1 
1 1 
0 1 
0 1 
0 1 
0 1 
0 1 



A random search is performed for l~Lw and it is found that the following 28 
vectors are sufficient for decoding all errors of weight < 2. Here m is taken to 
be 14. 

Suppose a single error has occurred in position 10. In this case v will be 

14 16 18 14 15 16 13 15 17 21 15 15 15 16 17 15 

15 14 17 15 17 17 15 13 15 17 17 15 17 17 16 15 

Note that position 10 has the highest value (21 in this case) among the values 
of V . Now suppose that two errors have occurred in position 10 and 20. In this 
case V will be 

9 9 7 11 9 10 10 12 10 6 9 8 8 10 9 10 

8 10 86 10 8 9 9 9899999 11 

Note again that position 10 and position 20 have the smallest values (6 in 
this case) among the values of v . Similar calculations can be done for other error 
patterns. One can find the different thresholds to detect and locate different kind 
of correctable error patterns. From this a single threshold can be found for all 
error patterns. 



ASYMPTOTIC BEHAVIOR OF THE ALGORITHM 

Here we test the performance of the proposed algorithm on large codes usually 
used in designing cryptosystem. The decoding problem of general linear block 
codes is known to be NP hard. This fact is used as a basis for designing many 
public-key and identification systems [4,7]. A typical code size for these appli- 
cations is (1024,512). 

d. McEliece Public-key Cryptosystem: (Goppa (1024,512,51)) 

In this system, the public-key Q is a, k x n matrix composed of three private 
matrices: a, k x k scrambling matrix S , a A: x n Goppa code generator matrix 
G and a n x n permutation matrix P such that G = SG P. To encrypt, the 
information vector is first multiplied by G then an error vector e of weight t or 
less is added. That is, the ciphertext vector y is given by 

y = uG 0 e. 

Knowledge of the private keys enables one to easily decode y while this is not the 
case with G alone. Godes of large sizes are proposed for this system. These codes. 
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generally, behave like randomly generated codes and one can reasonably assume 
that the binomial distribution is a good approximation for the weights [5] . 

Using this approximation, it can be shown that the probability, p, in erro- 
neous positions of v is given by 



p = 



Y' 

odd \w—m) \m— 1/ 
odd\m/ \w—m) 



and the probability, q, of error free positions ion v is given by 



q = 



odd ( 



E 



m odd 



n—t — 1 \ / t \ 
10 — m — 1/ \m) 

■ 

\m/ \w — m/ 



We have interest in estimating the number of h vectors, N, required for there 
to be a 0.95 probability that the relative frequency estimate for the probability 
of an error event would be within e of P. For large codes, it is noticed that the 
difference between p and q is very small. This puts some restriction on the value 
of e. One can use the central limit theorem to bound this number. Let fe{N) be 
the relative frequency of the error event in some position. Since fe{n) has mean 
p and variance p{p— 1), then it can be shown [6] that for a 0.95 probability that 

A^ = 625 x lQ-^p{l-p)€~‘^. 

Using this, the number of h vectors required to identify the erroneous places in 
McEliece system is 2^®. Our experimentation shows that for a single 700 MHz 
PENTIUM II processor, one needs a round 2® hrs per decryption. This can be 
significantly reduced using parallel computations. One needs to note that a large 
number of vectors has to be stored. This, however, is within the reach of today’s 
technology. 



4 Conclusion 

In this paper, a new algorithm for decoding general linear block codes is propo- 
sed. The algorithm is of a different flavor from classical decoding algorithms and 
is much more efficient. Preliminary results of its performance have been presen- 
ted. This can be used as a solution to decode many good linear codes that can 
be constructed with no obvious algebraic structure. This also suggests a possible 
class of attacks on crypto and identification systems based on the difficulty of 
the decoding problem of general linear block codes. 



Acknowledgment. The author would like to thank Prof. Paddy Farrel for the 
discussion regarding the subject of the paper. 
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Abstract. In this contribution, the probability of undetected errors on 
channels with memory is determined. Different error detection strategies 
(e.g. shortened hamming codes), which have been adopted as interna- 
tional standards, were analyzed. A setup is presented, which is based 
on a classification of error patterns in blocks by error-length and error- 
weight. This approach is used to determine the probability of an error 
pattern being a valid codeword and consequently not detectable. For 
these investigations a digital channel model, whose characteristics were 
found by analyzing real shortwave channel connections, was used. As 
a result of the investigation it is shown, that generator polynomials of 
the same code rate on channels with memory are the more efficiently 
the more equidistantly the exponents in the generator polynomial are 
distributed. 



1 Introduction 

Hamming codes or shortened Hamming codes are widely used for error detec- 
tion in data communication systems. For example, a distance-4 cyclic Hamming 
code with 16 bits for error detection has been adopted by the CCITT (Recom- 
mendation X.25) for use in packet-switched data networks [6], [5]. The code is 
generated either by the polynomial 

Gi{z) = {z+ l){z^^ + 2 :^^ -b z^^ + z'^^ + z'^ + z^ + z^ + z + l) 

= 2 ^® + z^^ + z^ + l (CRC-CCITT, Code-1) (1) 

or by the polynomial 

G2(z) = (z+ + 2^"^ + 1) 

= 2 ^® -b 2 ^^ -b 2 -b 1 (Code-2) (2) 

* This contribution was presented in parts at the Nordic HE Conference, Faro/Sweden, 
August 2001. 



B. Honary (Ed.): Cryptography and Coding 2001, LNCS 2260, pp. 9-19, 2001. 
@ Springer- Verlag Berlin Heidelberg 2001 
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where + z + 1 and z^^ + + 1 are primitive 

polynomials of degree 15. The natural length of this code is n = 2^^ — 1 = 32767. 
In practice, the length of a data packet is no more than a few thousand bits, 
which is much shorter than the natural length of the code. Consequently, a 
shortened version of the code is used. Often, the length of a data packet varies 
from a few hundred bits to a few thousand bits [5] . 

An undetected error occurs, if a transmitted codeword is distorted by the 
channel and appears as a different valid codeword on the receiver side [2]. The 
probability of an undetected error is a commonly used indicator for the reliability 
of a given code-channel combination [12]. To be able to analyze the performance 
of such error detection strategies, time-consuming simulations are necessary. Of- 
ten, either the weight distribution of the code or its dual code is used to determine 
the reliability of a given code-channel combination [6], [7], [11], [13] or numerical 
results can be found in the literature [12]. Asymptotic bounds are an insufficient 
measure since they do not take the real channel conditions into account [13]. By 
analyzing real shortwave channel connections, it was found in [8] and [9], that a 
classification of error patterns by the error-length and the error-weight leads to 
better results than to work only with the error- weight distribution. The reason 
for this is, that not all error patterns appear equally distributed on the real HF 
channel. For the determination of the performance of different error detection 
strategies, the error patterns in blocks of the length n are classified in this paper 
by the error-length I and the error- weight g. In Fig. 1 an error pattern of the 
length I = 6 with the weight 5 = 3 in a block of length n = 9 is shown. 



n = 9 







m 

II 

c, 

CD 

II 

1 






H n 1 H 


1 











Fig. 1. Error pattern in a block of the length n 



With the digital channel model derived in [3], we are able to generate typical 
shortwave error patterns. This model is based on the assumption 

PB(n) = pe • n“ for n < n^ax , (3) 

where the block-error rate PB{n) is described as a function of the block-length 
n, the bit error rate Pe (BER) and the error concentration value (1 — a) within 
the parameters of 0 < (1 — a) < 0.5 [3], [9]. The value of rimax indicates the 
maximum block length, to which the model assumption can be maintained with 
PB{n = = 1- This digital channel model takes the interdependence of 

errors into account, as it can be expected on many HF channels. 
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2 Basics 

The error detection coding fails, if a transmitted codeword is distorted by the 
channel and appears as a different valid codeword at the receiver side. In this 
case, the error pattern in the block is a valid codeword. The undetected error 
probability pBR(ti)) as a commonly used indicator for the reliability of a given 
code-channel combination, can be defined as 

Pbr(r) = R-pB{n) , (4) 

and describes the ratio of undetected faulty blocks to transmitted blocks. With 
a coding strategy the reduction factor R, as the ratio of undetected faulty blocks 
to faulty blocks, should be kept as little as possible. The encoding sets, which 
number vi^g of error patterns of the length I with the weight g are valid codewords 
and therefore not detectable. In a block of the length n 

= (^- 1 ) • ( 5 ) 

patterns with the criteria I and g can be found. If the two outer erroneous bits 
terminate the error pattern, there exist (^^ 2 ) possible error patterns. They may 
appear in total (n — l+l) times in a block of length n (Fig. 2). Altogether 2" — 1 
other error patterns are possible. 




= (3:2) =3 („ _ ; + 1 ) ^ (7 _ 5 + 1) = 3 

possible error patterns possible positions of the 

second error pattern 



Fig. 2 . Possible error patterns in a block of the length n = 7 with the length I = 5 and 
weight g = 3 
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Example 1. Block length, error weight and error length 

As an example, a block of length n = 7 with an error pattern of length I = 5 
and weight g = 3 is considered. In such a case, there may appear 




possible error patterns. This is illustrated in Fig. 2. 

The rate of patterns with the criterion I and g is then given by 



(6) 



From the (gZ.\) possible patterns are now vpg not detectable, since they are 
valid codewords. The generation of the valid codewords and the determination 
of their length- weight distribution will be explained in section 3. The probability 
ypg that an error pattern of the length I with the weight g is not detectable is 
given by 

m.9 = 7 ?% • ( 7 ) 

\g-2) 

Considering error patterns of the same probability, the rate 



0'l,g • yi,g 



( 8 ) 



is not detectable. In the real channel, apg is now substituted by 



xi,g 



P{l,9,n) 

PB{n) 



(9) 



which is the probability of an error-pattern of the length I with its weight g 
in a faulty block of the length n. The probability, that a block of the length 
n is distorted by an error pattern of the length I with the weight g is given 
by P{l,g,n). This modification is necessary, because the error patterns are not 
equally distributed on real channels. In Fig. 3 and 4 the probabilities xpg on 
channels without and with memory are shown, respectively. With the model used 
here, the classical memoryless channel is specified by an error concentration value 
of (1 — a) = 0, whereas an error concentration value bigger than zero describes 
a channel with memory [3] . The reduction factor R by coding can be defined as 



n I 

p= > ( 10 ) 

l=k+l g=2 

since with the assumption of cyclic (n, m) coding (m bits of information and 
{n — m) = k parity bits) patterns up to the length k are detectable [9], [1]. The 
value xpg in (10) considers the channel characteristic and the value ypg the code 
characteristic. 
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Fig. 3. Probability xi^g with pe = 10 (1 — a) = 0.0, n = 7, 




Fig. 4. Probability xi^g with pe = 10 (1 — q) = 0.5, n = 7, 



3 Determination of the Code Characteristics 

For the determination of the undetected error probability, the valid codewords 
must be classified by their lengths and their weights. It is assumed, that the 
pattern in Fig. 1 represents a valid codeword. It has the length I = 6 and weight 
g = 3. If the generator polynomial G{z) is known, the valid codewords C{z) can 
be constructed by multiplying it by the information polynomial N(z): 

C{z) = G{z) ■ N{z) . (11) 

For the generation of the codewords with the criteria I and g, only the infor- 
mation polynomials Nc{z) £ N{z) have to be taken into account, which are not 



14 



C. Lange and A. Ahrens 



divisible by z, since cyclic shifting does not affect the codeword length- weight 
distribution. Thus, the information polynomials N(z) are considered, for which 

N{z)modzy^0 ( 12 ) 

holds. This is illustrated in Fig. 5 for a (7,4) Hamming code (generator polyno- 
mial G{z) = z^ + z+1) with the exemplary considered information polynomials 
Ni{z) = 1 and N 2 {z) = z. In both cases the codewords have the same charac- 
teristic. The resulting codeword length-weight distribution for a (7,4) Hamming 
code is indicated in Tab. 1. The total number of codewords is obtained by a 



Table 1. Elements vi^g of the (7,4) Hamming code 



g\l 


4 


5 


6 


7 


CO 


1 


- 


1 


1 


4 


- 


1 


1 


2 


7 


- 


- 


- 


1 



multiplication of vi^g by (n — I + 1), since a pattern with the criteria I and g can 
altogether occur (n — I + 1) times within a codeword with cyclic shifting. This 
approach leads to the well-known weight spectrum of codewords [10]. 

The probability yi^g of not detecting an error pattern with the criteria I and 
g, is obtained after (7). An error is not detectable, if it corresponds to a valid 
codeword. Due to the error detecting properties of cyclic codes, the position of 
the patterns with the criteria I and g within the block is of no interest. 

Example 2. Determination of ypg for I = 4, g = 3 for the (7, 4) Hamming Code 





and vpg = U 4,3 = 1 



(see Tab. 1) 



1 

^ yi,g = J/4,3 = 2 



Codeword 1 - A^i(z) = 1 

n = 7 I 




Codeword 2 - N 2 {z) = z 



1 . n = 7 _ 1 




II 

to 

II 

CO 

r 










1 




^ 1 1 


1 



Ci{z) = 1 + Z^ + Z^ G 2 {z) = z ■ C\{z) 



Fig. 5. Characteristics of the valid codewords for a (7, 4) Hamming-Code 
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Only one of the two possible error patterns with the criteria I = 4 and g = S 
is a valid codeword. Therefore the probability of not detecting an error pattern 
with I = 4 and g = 3 amounts to 0.5. The probability of the appearance of such 
an error pattern is ruled by the channel characteristics. 

For the investigated polynomials (Code-1 and Code-2) the probabilities yi^g 
are depicted in Fig. 6 and 7. The importance of the generator codeword (I = 
17, g = 4) for the determination of the reduction factor and the undetected error 
probability becomes obvious. 




Fig. 6. Code-1 with n = 30 



4 Results 

In this section, the reliability of different code-channel combinations is analyzed. 
For the consideration of the channel characteristics the error patterns in faulty 
blocks must be classified by their error- length and error- weight. The searched 
probabilities xi^g can be determined by simulations, using the digital channel 
model suggested in [3]. The digital channel model is available at [4]. 

Results of the analysis of both generator-polynomials Code-1 and Code-2 
are shown in Fig. 8 and 9. For the investigations, an error concentration value 
of (1 — Of) = 0.3 was chosen, as it was found by analyzing real short wave 
channel connections in Central Europe [3] , [5] , [9] . An error concentration value 
of (1 — a) = 0 describes the classical memoryless channel. The superiority of the 
Code-1 on channels with memory is illustrated. This analysis led to the result, 
that codes of the same length are efficient on channels with memory, if the 
exponents in the generator polynomial are equidistantly distributed. With the 
Code-2, this is not as fulfilled as when the Code-1 is considered. For this reason. 
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Fig. 7. Code-2 with n = 30 



CRC-CCITT, Coded 
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0.1 



0.2 



0.3 

bit error rate 
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0.5 



Fig. 8. Code-1 with n = 30 



the superiority of the Code-1 becomes obvious on channels with memory over the 
Code-2. The Code-1 generally leads to a lower undetected error probability on 
channels with memory than the Code-2. The analysis of the code characteristic 
has shown, that the structure of the generator polynomial has a dominating 
influence on the determination of the undetected error probability. For the 
analysis of these properties, the error-gap density function v{b) = P{X = b) 
on channels with memory was used, which describes the probability that after 
an error the gap to the next error is b intervals long [3]. This point makes 
obvious why the Code-2 works for (1 — a) = 0.3 worse than the Code-1. The 
probability, that after an error in the distance 6 = 0 an error reappears, is 
increased with increasing error concentration value (Fig. 10). But exactly such a 
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Fig. 9. Code-2 with n = 30 



combination (parts and = 1 in the generator polynomial of Code-2) leads 
to a valid codeword at the Code-2. A behaviour exactly reverse can be found at 
the Code-1. With an increased error concentration value (1 — a), the value of u(0) 
increases, too, but this result does not lead to a valid codeword. Thus, a generator 
polynomial with equidistantly distributed exponents results in codes with good 
undetected error probability characteristics on channels with memory, since the 
generator polynomial dominates the undetected error probability. Looking at 
Fig. 9 it becomes obvious, too, why the Code-2 works on channels with memory 
worse than on memoryless channels. Based on the aspect, that with increasing 
error concentration value, the value of u(0) increases and that such a constellation 
(two or more adjacent exponents in the generator polynomial are not zero) leads 
to a valid codeword, it can be explained, why we have a loss in the efficiency. In 
Code-1 such a combination was missed and approximately the same performance 
was achieved. 



5 Conclusion 



By the classification of the error pattern suggested in this work by the error- 
length and weight we are able to achieve an optimal combination of a given code- 
channel characteristic. The proposed setup takes the channel characteristics of 
the shortwave channel at the determination of the undetected error probability 
into account in an optimal way. It is shown, that codes of the same rate and 
length are efficient, if the exponents in the generator polynomial are equidistantly 
distributed. For the data transmission over shortwave channels the CRC-CCITT 
code (Code-1) shows the best results. 
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Fig. 10. Error-gap density function for short error-gaps b 



Acknowledgement. We thank Prof. R. Kohlschmidt and Prof. R. Rockmann 
of Rostock University for their support of our work and Dr. C. Wilhelm for 
valuable comments. 



References 

1. Ahrens, A., Wilhelm, C.: Bestimmung des Reduktionsfaktors bei Einsatz zyklis- 
cher Codes auf Kanalen mit gebiindelt auftretenden Ubertragungsfehlern. 10. Sym- 
posium Maritime Elektronik, Arbeitskreis Maritime Mess- und Informationselek- 
tronik, June 2001, Universitat Rostock, Germany, (141-144) 

2. Ahrens, A., Lange, C.: On the Probability of Undetected Error in Protocol Struc- 
tures. Nordic Shortwave Conference (Nordic HF 01), August 2001, Faro, Sweden, 
(3.1.1-3.1.8) 

3. Ahrens, A.: A New Digital Channel Model Suitable for the Simulation and Eval- 
uation of Channel Error Effects. Colloquium on Speech Coding Algorithms for 
Radio Channels, lEE Electronics and Communications, April 2000, London, UK, 
Reference Number 2000/030 

4. http://www-nt.e-technik.uni-rostock.de/nt/english/fg_nt.html 

5. Ahrens, A., Greiner, G.: A Radio Protocol for TCP/IP Application. Colloquium 
on Frequency Selection and Management Techniques for HF Communication, lEE 
Electronics and Communications, March 1999, London, UK, Reference Number 
1999/017, 11/1-11/6 

6. Fujiwara, T., Kasami, T.; Kitai, A., Lin, S.: On the Undetected Error Probability 
for Shortened Hamming Codes. IEEE Transactions on Communications 33 (1985) 
570-574 

7. Leung, C.: Evaluation of the Undetected Error Probability of Single Parity-Check 
Product Codes. IEEE Transactions on Communications 31 (1983) 250-253 

8. Wilhelm, C.: Uber den Zusammenhang zwischen Kanal und Codierung bei der 
Datenfernilbertragung. Nachrichtentechnik-Elektronik 17 (1967) 386-392 

9. Wilhelm, C.: Dateniibertragung. Militarverlag, Berlin (1976) 





On the Undetected Error Probability for Shortened Hamming Codes 



19 



10. Wozencraft, J. M., Jacobs, I. M.: Principles of Communication Engineering. Wave- 
land Press, Prospect Heigths (Illinios) (1990) 

11. Wolf, J., Michelson, A., Levesque, A.: On the Probability of Undetected Error for 
Linear Block Codes. IEEE Transactions on Communications 30 (1982) 317-324 

12. Wong, B., Leung, C.: On Computing Undetected Error Probabilities on the Gilbert 
Channel. IEEE Transactions on Communications 43 (1995) 2657-2661 

13. Witzke, B., Leung, C.: A Comparison of some Error Detecting CRC Code Stan- 
dards. IEEE Transactions on Communications 33 (1985) 996-998 




The Complete Weight Enumerator for Codes 
over Atnxs(^Fg) 



Irfan Siap 

Sakarya University, Sakarya, Turkey. 
isiapQscikarya . edu . tr , 

WWW home page: http://www.sakarya.edu.tr/" isiap 



Abstract. The MacWilliams identity for codes over AInxs(Fg ) endowed 
with a non-Hamming metric is proved in [1]. We introduce a complete 
weight enumerator for these codes and prove a MacWilliams identity 
with respect to this new metric for the complete weight enumerator. 



1 Introduction 

Let Fg = {0 = ctOjCti,... ,aq-i} denote the finite field with q elements. Let 
denote the set of n x s matrices over F,. A new non-Hamming p 
metric on linear spaces over finite fields has been recently introduced in [3] . Let 
^ = (Po,Pi, • ■ ■ ,Ps-i) e Adixs(F?)- Then, 



p{u) 



max{i|pi 0} -k 1, Pi 0, 

0, w = 0. 



( 1 ) 



Let 17 = (wi,W 2 , • ■ ■ G Ad„xs(Fg) and define p(l7) = YH=oP{^i)- 

p is a metric over Ad„xs(IF'g). 

Definition 1. A linear subspace C of A4„xsi^q) is called a linear code. 

Let C C Afnxs(IF'g) be a linear code. Wr{C) = |{17 G C|p(l7) = r}|, 0 < 

r < ns is called the p weight spectrum of the code, and the p weight enumerator 
is defined as follows: 



W{C\z) = = Y, (2) 

r=0 Dec 

Let wi = (po,Pi, ■ • ■ ,Ps-i ),^2 = (go,9i, ■ ■ • ,qs-i) e Adixs(F,j). The inner 
product of uj\ and LO 2 is defined by 

s-l 

(^1,^2) = y^Pigs-i-i ( 3 ) 

i=0 
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and this is extended to inner product of . . . , = (/^i? • • * 5 /^n)^ C 

-^nxs(^q') 

n 

{ni,n2) = '^{uji,iii). (4) 

The dual code of C is defined by 

= {n^ e Mnxs{¥g)\{Q2, f2i> = 0 for all f?i e C}, (5) 

and C-*- is also a linear code of length n. 

For s = 1 and arbitrary n the p metric coincides with Hamming metric 
and the Mac Williams identity is given in [2]. For n = 1 and arbitrary s the 
MacWilliams identity is given in [4]. Now we consider the following example 
given in [ 1 ], 

Example 1: 




The p weight enumerator of the above codes is 1 + 
The dual codes of Ci and C 2 are 




The p weight enumerators of Ci and C 2 are 

W{C^ I ^) = 1 + 4^-^ + 22 + 2 ^ W(C^ I 2 ) = 1 + 22 ^ + 2 ^ + 32^ + 2 . (7) 

As seen above, although the p weight enumerators of the codes C\ and C 2 
are the same, the p weight enumerators of the duals are different. To overcome 
this problem, in [1], the orbits of a linear group which preserves the metric p are 
considered. It is shown that The MacWilliams identity over such orbits holds [4] . 
In the next section, we propose a complete weight enumerator which overcomes 
this problem and carries more information for the code. 
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2 The complete weight enumerator 

Let us consider the matrices A = and B = g^ • p(^) = p{B) = 2. 

However, the rows of the matrices A and B have different structures and it 
is natural to expect the problem occured in the above example. The weight 
p strongly depends in the order of the elements of the rows. It is possible to 
overcome the problem that occured in the above example by defining a weight 
enumerator that preserves the order of the entries of the matrices and carries 
more information about the code. Before we give the definition of the complete 
weight enumerator we make the following identification: 



<^i : Adixs(Fg) -)■ F5 [z]/(z®) 

P = {P0,Pl,- ■ ■ ,Ps-l) -t Po +PlX H 

Let P = (Pi, . . . , P„) where Pi = {pio,Pii, ■ ■ ■ ,Pi,s-i) for 1 < i < n. We extend 
ipi to 

‘P :XnXs(F,) -)■ Mnxl{Fq[x]/{x^)) 

P -t (poO +P 0 lX H hpo,s-ia;*“\ . . . ,PnO + POlX H \- Pn,s-lX^~^)'^ ■ 

The maps defined above are vector isomorphisms over . The p weight of a 
polynomial p{x) G P^[x]/(x®) is simply deg(p(x)) + 1, i.e. 

p{p{x)) = deg{p{x)) + 1. (8) 

Letp{x) = poH \-ps-ix^~^ G Fg[x]/(x®). The Ith (0 < / < s — 1) coefficient 

of p{x) is defined by 



ci{p{x))=pi. (9) 

LetP(x) = (Pi(x),... ,P„(®))^,andQ(a:) = (Qi(a:),... ,Qn(»))^ £ Mnxi{Fq\x]l {xA) 
where Pi{x) = pw + pax + ■ ■ ■ + pi^e-ix^~^, and Qt(x) = qio + gnx + • • ■ + 

The inner product of P(x) and Q(x) defined above becomes: 

n 

(P(x),Q(x)) ='^Cs-i{Pi{x)Qi{x)). (10) 

i=0 

The Hamming weight of an element a ^ Fq is defined by 

Jo, if a = 0 
w[a) = < 

I 1, otherwise. 



( 11 ) 
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Let C C A^nxs(®'9) be a linear code with size m. For simplification purposes, 
let C = Also, let 





( «lo 


ail •• 


fA\ ' 




= 


«20 


a\i •• 




, 0 < i < 




\“n0 


(/) 

a„i ■■ 


■ • / 





Let 

bns — (j/lO;-- - — Ij-- - tDiiOt ■ ■ ■ j2/ra,s — l)- 

We define the complete p weight enumerator of a code C by 



= E ■ • ■ ^S-r 



( 12 ) 



i=0 



Note that the complete p weight enumerator is a polynomial of ns variables. 
Further, it is possible to obtain the p weight enumerator by specializing the 
complete p weight enumerator. 

Example 2: The p complete weight enumerators of the codes Ci, C 2 , 
and C2 (Example 1) are 



(5^22) =1 + yioy20i 

Wc^{Y22) =1 + 2/21, 

(^22) =1 + 2/112/21 + 2/ io 2/ ii 2/2 o 2/21 + 2/io2/ii2/2i + 2/ii2/2o2/2i 
+ 2/10 + 2/20 + 2/102/20, 

(^22) =1 + 2/10 + 2/11 + 2/21 + 2/102/11 + 2/102/20 + 2/112/21 + 2/102/112/21- 

By letting, 2/io2/n = and 2/202/21 = obtain the p 

weight enumerators (6). 

The following lemmas are going to play an important role in the proof of the 
main theorem. 



Lemma 1. [2] Let x be a nontrivial additive character of¥g. Let j3 he a fixed 
element of Fg . Then, 



q-l 

i=0 



q, P = 0, 

0, /?7^0. 



Lemma 2. Let X he a nontrivial additive charaeter of¥g. Then, 



E xi{P{x),Q{x))) 

P{x)€C 



0, Q{x) i C^, 
|(7|, Q{x)eC^. 



(13) 
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Proof. If Q{x) G C^, then it is clear. If Q{x) ^ C-^, then there exists P{x) £ C 
such that {P{x),Q{x)) ^ 0. Let {P{x),Q{x)) = 7 . Then, the map 

Pq{x) : C -t Fg, 

n 

P{x) {P{x),Q{x)) = ^Cs-i{Pi{x)Qi{x)) 

i=(} 

is Fg-linear and onto. Thus, C /Ker{ipQ(^x)) — Fg- Hence, 



XI x({Pix),Qix))) = X X(a) = 0, Lemma I. □ 
p(x)ec ^ aeVg 



Lemma 3. Let x be a nontrivial additive character ofVq and i,j be fixed. Let 
p{x) = pio + Piix H h Pi,s-ix’"^^ e ¥g [x\l (x®). 

X xmx),ax^))yZ^^^ = (1 + (g - 

aGFg 



Proof. 



X x({p{x),ax^))y^^‘^^ = X x{cs-iipix){ax^)))yj^^‘^^ 

aGF, aGF, 



«gf. 



1 + ^x(Pi,s-i-jak)y^^°‘'‘'' 



1 + {q-l)yij, 
1 ~ Vij ) 



Pi^s — l—j — 0? 

Pi,s — l—j ^ 0- 



(Lemma !.)□ 



Lemma 4. Let / : A4„xi(Fg[x]/(x*)) — >■ C[j/io, ■ • ■ , j/n,s-i] and x be a nontriv- 
ial additive character of¥g. Then, 



X /(^(^)) = ic\ E />(^)) 

q(®)gC'-l ' ' p{x)ec 

where /(P(x)) = EQ(x)eM„>,i{¥,[x]/ix'>)) X {{P{x),Qix))) fiQ{x)), 
P{x) = (Pi(x),... ,P„(x))^ and Q{x) = {Qi{x),... ,Q„(x))^. 
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Proof. Let Pi = pm + h ^ and Qi{x) = Qio -\ h ^ for 

1 < i < n. 



^ f{P{x))= Y. Y x{{P{x),Q{x)))f{Q{x)) 

P(x)eC P{x)EC Q(x)EMnxl(^q[x]/(x‘‘)) 

= m x{{P{x),Q{x))) f{Q{x)) 

P(x)eC Q(x)EC^ 

P Y Y x{{P{x),Q{x))) f{Q{x)) 

P(x)eC Q(x)^C-^ 

= 1^1 Y (t>y Lemma 2.)D 

Q(x)€CX 



Theorem 1. 



E 


„,w(qio) 

i/lO 


Q(x)eCX 


/ 




' n s — 1 

nn 

. i=l j=0 



■U!{qio) ,,u'(?no) 

' ' ' yi,s-l ' " OnO " ' yn,s-l 



n s—1 

n 11(1 + (« - w,) I j: niivi +(,-!)», 

p(x)eCk=ii=o ^ 



i-Vki 



Proof. We take 



/((Ql(x),... ,Qn{x))) =t/ro^®'“^ 



«)(9i,»-i) „w{qno) w{qr,,,-i) 

yi,s-l tlnl yn,s-l 



in Lemma 4. Then, 

f{P{x)) = Y X mx), Q{x))) tyr/"-) . • ■ 

Q(x)eMnxl(^g[x]/(x‘)) 

n 

= Y I[x m{x),Qi{x))) tyr/^^o) • ■ • ■ ■ • 3/:,^*"“^ • • • 

Q(x)EM„x\{Vq[x]/{x‘)) i=l 

= Y x{{Pi{x),qw)yw‘^^°^ ■■■ Y x{{Pi{x),qi,s-ix‘'~^)y^Yi~^'' 

^loGFg ^l,s-l€Fq 

• Y x{{P2{x),q2o)y2o'^^'’'^ ■■■ Y ^((^((e), g2.s-ia;*“^)ty^ilY“'^ 

920€Fq 92,3-lGFq 



• Y X{{Pn{x),qno)yZo'^°^ ■■■ Y X{{Pn{x),qn,s-lx‘' 

9nO€Fq 9n,s — l€Fq 
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Applying Lemma 3, 



S — 1 

f{P{x)) = JJ(i + (9 - ')(i - 



/=0 



5 — 1 

= [](1 + (9 - 
1=0 

= nn(i+(«-i)!'«) nn(i 



\i=l j=0 



k=l 1=0 



+ (9 - Ovkl 



.□ 
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Abstract. Solutions based on error-correcting codes for the blacklist- 
ing problem of a broadcast distribution system have been proposed by 
Kumar, Rajagopalan and Sahai. We have optimized their schemes by 
choosing the parameters properly. In this paper, we propose the fur- 
ther improvement for their schemes. On the transmission, the improved 
schemes with the parameters chosen for the optimized original schemes 
are more efficient than the optimized original schemes. The average 
amount of transmission is 70 percents of that for the optimized origi- 
nal scheme in one of the typical cases, while the amount of the storage 
is the same. 



1 Introduction 

Consider the distribution of digital contents over a broadcast channel where the 
contents should be available only to subscribers among the users of the broadcast 
channel, e.g., pay-TV. A data supplier gives each subscriber a decoder containing 
a secret decryption key and broadcasts the contents in encrypted form. For each 
distribution, every decoder decrypts the broadcasted contents in encrypted form 
by using its secret decryption key. We will call a system for such distribution a 
broadcast distribution system (BDS). 

As pointed out in [3], there is a desired property for a BDS: For every distribu- 
tion, the data supplier can prevent subscribers from decrypting the broadcasted 
contents in encrypted form without renewing any secret decryption key where no 
coalition of prevented subscribers can recover the contents from the broadcast. 
A BDS which meets the desired property is suitable for the environment where 
for every distribution the data supplier identifies those whom the contents are to 
be distributed, e.g., pay-TV with various programs. The problem to construct 
a BDS which meets the property and allows prevention of a limited number of 
subscribers is called the blacklisting problem [1,6-11]. 

In the general model of a BDS, the broadcasted contents in encrypted form 
consists of an enabling part and a cipher part. The cipher part is the symmetric 
encryption of the contents under a session key. For each distribution, a session 
key is chosen randomly. The enabling part contains the information to obtain 
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the session key by using a secret decryption key. Under the assumption that the 
used symmetric encryption algorithm is secure, the security of distributing the 
contents is reduced to that of distributing a session key. Then, we consider the 
problem to distribute a session key securely. 

In this paper, we focus on the unconditionally secure solutions for the black- 
listing problem. In [1, 8], the efficient solutions based on threshold secret sharing 
are proposed. In [6], Kumar et al. proposes the solutions based on an error 
correcting code and a cover-free family. The solutions in [6], called the KRS 
schemes in this paper, are unconditionally secure. Although the solutions in [1, 
8] are asymptotically more efficient than the KRS schemes, it is important to 
improve the efficiency of the KRS schemes. A reason given in [4] is that the KRS 
schemes are more appropriate than the solutions in [1,8] for a long-lived broad- 
cast encryption. In fact, the long-lived broadcast encryption schemes based on 
the KRS schemes are proposed in [4]. 

In [10], we have analyzed the KRS schemes detailedly and presented a method 
to choose the values of parameters such that the amount of transmission is min- 
imized (the results in [10] include that in [9]). In this paper, the KRS schemes 
with such choice of parameters are called optimized KRS schemes. The perfor- 
mance of the optimized KRS schemes is shown in [10]. From the results in [10], 
the KRS schemes are suitable and practical for the sizes of a blacklisting prob- 
lem such that the relatively small amount of information, say just a single key, 
is sent, and the proportion of the maximum number of the excluded subscribers 
to that of all subscribers is small, say not more than 1%. 

Our goal is to improve the KRS schemes furthermore. The improvement 
means to decrease the amount of the transmission, while that of the storage is 
the same. We define the enabling part more properly by using the property of 
the cover-free family. By improving the KRS schemes, there are cases where the 
amount of transmission becomes only the several times as much as that in [1, 8]. 

In Sect. 2, we show the definition of a blacklisting problem. The difference 
between the improved KRS schemes and the previous KRS schemes considered 
in [4, 6, 7, 9, 10] is the definition of the enabling part. Then, in Sect. 3, we describe 
the original KRS schemes in [6]. In Sect. 4, we present the improvement and the 
detailed analysis of the improved KRS schemes. Finally, in Sect. 5, we show the 
numerical results of the improvement. 



2 Blacklisting Problem 



In this paper, we consider an {N, AT)-blacklisting problem with 0 < K < A^. In 
the model, there is the data supplier and the group of subscribers. N denotes the 
number of the subscribers. The subscribers are assumed to be numbered from 1 
to N. K denotes the maximum number of subscribers whom the data supplier 
can prevent from recovering the session key contained in the enabling part. The 
prevented subscribers are called excluded subscribers. All subscribers except for 
excluded subscribers are called unexcluded subscribers. 
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An {N, A')-blacklisting problem is a problem to construct a BDS which sat- 
isfies the following three requirements: 

(Rl) Without knowing the broadcasted enabling part, no subset of subscribers 
has any information about the session key, even given all the secret decryption 
keys of the subscribers; 

(R2) An unexcluded subscriber can uniquely determine the session key from the 
broadcasted enabling part and his secret decryption key; 

(R3) After receiving the broadcasted enabling part, no subset of excluded sub- 
scribers has any information about the session key. 

The performance of a solution for an (iV, AT)-blacklisting problem is evaluated 
by the following three complexity measures: 

(CMl) The size of storage required by the data supplier; 

(CM2) The size of storage required by a subscriber; 

(CMS) The size of a broadcasted enabling part, called the necessary increase of 
bandwidth. 



3 Kumar-Rajagopalan-Sahai Coding Constructions 



The KRS schemes proposed in [6] have the following common framework based 
on an error correcting code: A session key is encoded into one or more codewords; 
For the set of excluded subscribers, some code symbols are removed and the 
remaining code symbols are encrypted in such a way that every unexcluded 
subscriber can obtain the enough code symbols to recover the codewords by 
using his decryption key and any excluded subscriber cannot obtain any code 
symbol. In this framework, a cover-free family takes important role. The schemes 
differ in the construction of the cover-free family, and three constructions of the 
cover-free family are proposed in [6]. In the following, we show the detailed 
description of the common framework of the KRS schemes and the role of the 
cover-free family. 

The KRS scheme for an (TV, K) blacklisting problem has five auxiliary pa- 
rameters n, p, a, k and q. The scheme uses the (Ai, AT, n,p, a)-cover-free family, 
the (n, k) maximum distance separable (MDS) code over GF{q) with k = [ap \ , 
and some unconditionally secure cryptosystem. 

First, we show the definition of the {N, K,n,p, a)-cover-free family. Consider 
a family of sets 

S = {Si,S2,... ,Sn}, 



where Si with 1 < f < is a subset with size p of the universe U with \U\ = n. 
Then, S is an {N, K, n,p, a)-cover-free family if for any S', 5'(, S' 2 , . . . € S, 



K 



s'WJs', 

i=l 






For simplicity, we use the (n,k) MDS code over GF{q), where q is a power 
of 2. This assumption, g is a power of 2, is natural, since we consider the case 
where the session key is represented as a binary string. 
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In [10], it is shown that the choice of parameters is optimum, i.e., minimizes 
the amount of transmission, only if the number of generated codewords in the 
KRS scheme is only one. In this paper, we also choose the parameters such that 
the generated codeword is only one codeword. Then, in the KRS schemes, the 
number of encryption keys of the used unconditionally secure cryptosystem is n. 
Let 



U = {ki,k2,... ,kn}, 

be the set of the n encryption keys, which is the universe in the cover-free family. 
For a message M, its ciphertext, which is encrypted with the encryption key kh, 
is denoted Ek^{M). In the following, we assume that Ek^{vh) is defined as kh+Vh 
and kfi is the random element of GF{q). 

The KRS scheme consists of two phases, an initial set-up phase and a dis- 
tributing phase. At the initial set-up phase, the data supplier constructs an 
{N,K,n,p,a)-cover-bee family S = {S'!, 52,... ,Sn}. The i-th user has Si as 
his secret decryption key, called a secret key set. The data supplier knows the 
secret key set of every subscriber. 

Next, consider the distributing phase. A set of the excluded subscribers are 
represented by A C {1,2,... , N}. A session key is represented as a fc-tuple over 
GF(g), denoted u, and the enabling part for the session key is constructed as 
follows. 

(Step 1) u is encoded by using the {n,k) MDS code over GF{q) into the code- 
word V = {vi,V 2 , ■ ■ ■ ,Vn). 

(Step 2) For the set X of excluded subscribers, the set 5(A) of used encryp- 
tion keys is the set of all encryption keys except for the keys held by excluded 
subscribers in A, that is. 



5(A) A 




1 < h < n,kh ^ U 



jex 



(Step 3) The set EP of ciphertexts is the enabling part where 

EP A {EkM\l <h<n,kh€ 5(A)} . 

For EP, every excluded subscriber can decrypt no ciphertext in EP while an 
unexcluded subscriber i can decrypt the ciphertexts {Ek,^{vh)\kh G 5^ n 5(A)} 
and retrieve the codeword, i.e. the session key, from the code symbols {vh\kh G 
Si n 5(A)}. 

Security of the KRS scheme 

(Rl) is satisfied since a session key is chosen randomly and independent of the 
encryption keys. The property of the cover-free family implies that (R2) is sat- 
isfied, since for every A with 0 < |A| < A and each unexcluded subscriber 
i ^ X , the number of code symbols that the unexcluded subscriber i can obtain, 
i.e., \{Ek^{vh)\kh G 5i n 5(A)}|, is not less than k = \ap\ - For every A with 
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^ < \X\ < K and each excluded subscriber i G X, there is no code symbol that 
the excluded subscriber i can obtain, thus and (R3) is satisfied. 

Complexity of the KRS scheme 

(CMl’) The size of storage required by the data supplier is that to store n 
encryption keys. The size of n encryption keys is n ■ log q. 

(CM2’) The secret decryption key is the set of p encryption keys. Then, the size 
of storage required by a subscriber is p ■ log q. 

(CM3’) For a set X of excluded subscribers, the necessary increase of bandwidth 
is not only the size of |S'(X)| ciphertexts. To inform which code symbols are 
removed, we add the 1-bit flag for each symbol. The total length of the flags is 
n-bit. Therefore, for the set X of excluded subscribers, the necessary increase of 
bandwidth is |S'(-^)| • logg -F n. 



4 Proposed Improvement and Its Analysis 

In this section, we propose the improvement to decrease the necessary increase of 
bandwidth while the amount of the storage is the same. For a set X of excluded 
subscribers, the size of the enabling part is proportional to the size of S{X). We 
give the more proper definition of S{X) to decrease its size as the improvement 
of the KRS schemes. 



4.1 Proposed Improvement 

Our idea is simple. To form S{X), we exclude not only all encryption keys 
of the excluded subscribers but also some keys of unexcluded subscribers from 
S'(X). The definition of S'(X) does not relate to whether the requirement (Rl) is 
satisfied or not. The requirements (R2) and (R3) are satisfied if 5(X) is defined 
as follows: No encryption key held by excluded subscribers in X is in 5(X), and 
[ap\ encryption keys of each unexcluded subscriber is in S{X). 

In the original KRS schemes, if |X| < K, the property of the {N, K, n,p, a)- 
cover-free family implies that many subscribers may have more than [ap\ en- 
cryption keys in S{X) to withstand the exclusion of one or more subscribers. 
We exclude such redundant encryption keys from S{X). 

For the given set X of excluded subscribers, we present the precise construc- 
tion of <S'(X). (Step 2) in the KRS schemes described in Sect. 3 is replaced by 
the following steps (Step 2-l)-(Step 2-3). 

(Step 2-1) A subset of unexcluded subscribers are chosen randomly from 
{1, 2, . . . , N} \ X, denoted Y. In order for the improved KRS scheme using any 
cover-free family to be secure, the size of Y should be at most K — |A|. 

(Step 2-2) For each i gY, the [ap\ encryption keys are chosen from 

as shown in Fig. 1. The set of keys chosen from Si is denoted Ti. Some greedy 

approaches are applicable to decrease the number of chosen keys. 
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Fig. 1. The subset Ti of Si. 





Fig. 2. The set S{X) of chosen encryption keys. 
(Step 2-3) S{X) is defined by 







1 < h < n,kh ^ [J Sj 
jexuY 




I < h < n,kh G [J T, 

iGY 



as shown in Fig. 2(a). 

Further excluded encryption keys of unexcluded subscribers are the subset 
of (Uier \ Uj6X 2). The size of S{X) is close to that of S(Z) 

with jZj = K, i.e. the minimum size. In the following, |F| is defined as K — \X\ 
to decrease the necessary increase of bandwidth as much as possible. 



4.2 Analysis 

In this section, for the improved KRS scheme, we analyze the security and the 
complexity. 
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We can show that the requirements (Rl) and (R3) are satisfied by the same 
way as that for the original KRS scheme. The requirement (R2) is satisfied if 
the followings (R2a) and (R2b) are satisfied: 

(R2a) For each unexcluded subscriber i gY, 

(R2b) For each unexcluded subscriber i ^Y, |S'i Q > [o^p\- 

From (Step 2-2) in Sect. 4.1, (R2a) is satisfied, since 

1 < h < n,kh € [J Tj 

iGY 

2S,f]T, = Ti, 

and |S'iP|S'(X)| > |Tj| = [ap\. From the remark in (Step 2-1), |F| is not more 
than K — |X|. Then (R2b) is also satisfied, since the excluded encryption keys 
U \ S{X) are chosen from the K or less secret key sets and the property of the 
(TV, K, n, p, Qf)-cover-free family. 

The complexities of the storages (CMF) and (CM2’) are the same as those 
in the original KRS schemes. The complexity (CM3’), the necessary increase of 
bandwidth, is given by |S'(X)| • log 2 q + n. Though |S'(X)| depends on the set 
X of excluded subscribers and the used cover-free family, the following lemma 
gives a simple upper bound on |S'(X)| which is derived from the definition of 
S{X) in Sect. 4.1. 

Lemma 1. For X with 0 < \X\ < K and Y with Xt^Y = % and |K| = K—\X\, 
\S{X)\<\S{XyjY)\ + {K-\X\)-[ap\. 

We will derive the average of |S'(X U y)| when the family S is chosen from a 
given fi, to obtain an upper bound on the average of |S'(X)|. Since |XUF| = K, 
it is enough to derive the average of |<S'(Z)| with jZj = K. Let a{Z,S) denote 
the size of S{Z) when the cover- free family S is used. We will give the average of 
a{Z,S) over the family 17 of the cover-free families under the assumption that 
each family in 17 is equiprobable. The average is denoted ct(Z, 17). 

In [6], three constructions of cover- free family are proposed. One construction 
is based on algebraic geometry code, and it is easy to see that the generated 
cover-free family is impractical [9, 10]. 

First, we consider the improved KRS scheme using the randomized construc- 
tion proposed in [6]. In the randomized construction, for given N and K, the 
(TV, K, n,p, a)-cover-free family S = {S'!, S' 2 , . . . , Sn} is constructed as follows: 

(1) Choose a positive integer p and its multiple n with n > 2p. Generate the set 
U of the n encryption keys. 

(2) Partition the set U into p disjoint subsets Ui,U 2 , ■ ■ ■ , Up. Each subset con- 
tains ^ keys. 

(3) Si with 1 < T < TV is obtained by choosing one key randomly from each 
subset Ufi with 1 < Ti < p. 

This is a probabilistic construction. To obtain the cover-free family with enough 
high probability, the parameters should be chosen properly. 



S,f]S{X) = sn kh 
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Let Qr be the set of all families constructed by the randomized construction. 
The following lemma shows o{Z, fin). The average is the approximation since Hr 
includes the families which are not cover-free families. However, if the success 
probability is enough close to 1, then the great majority of the families are 
the cover-free families, and the accuracy of this approximation is sufficient for 
evaluating the complexity. 

Lemma 2. Let L2 r he the set of all families constructed by the randomized con- 
struction. For a set Z of K excluded subscribers, 

a{Z,nR)=n - . (1) 

(Proof) Since C/i, C/ 2 , ... ,Up are disjoint subsets of U, \S{Z)\ is equal to \S{Z) n 

Ui\ \s{z') n C/2I -i- • • • -i- \s{z') n Up\. 

From Eq. (8.10) in [12], the average sum of sizes \S{Z) n Uh\ with 1 <h<p 
is equal to the sum of these averages. For Z, the each average size \S{Z) n Uh\ 
with 1 < ft, < p is equal to another average size \S{Z) n Uh' \ with 1 < ft' < p and 
ft' yf ft. Then, u{Z, Or) is derived from the average size of S{Z) n Ui. 

The set S{Z) D Ui is constructed by removing all the encryption keys in 
the K secret key sets from U\. For any encryption key, the probability that 
the key is not in some secret key set is equal to 1 — The reason is that for 
each of K secret key sets, one encryption key in Ui is chosen randomly from 
n/p encryption keys. For Ui, using the principle of inclusion and exclusion, the 
average number of encryption keys which are not in any of K secret key sets 
is equal to ^(1 — ■ This average is equal to the average size of S{Z) n C7i. 

Then, d(Z, 12^) = p . ^(1 - £)^ = n(l - f )^. □ 

From Lemmas 1 and 2, the upper bound of the average necessary increase of 
bandwidth is derived. 

Theorem 1. For a set X of K or less excluded subscribers, the average neces- 
sary increase of bandwidth is upper bounded by 

(n-(l-0 -\- {K - \X\) ■ [ap\^ -logq-\-n, (2) 

under the assumption that any family constructed by the randomized construction 
is used in the improved KRS scheme. 

Secondly, we consider the improved KRS scheme using the polynomial con- 
struction of the cover- free family proposed in [6]. In the polynomial construction, 
n is chosen as n = p^, and N distinct polynomials are generated to decide which 
p out of p^ encryption keys are assigned to each secret key set. Let fip^i be the set 
of all cover-free families constructed by the polynomial construction where the 
degree of all generated polynomials is limited to i or less. The following lemma 
shows ^{Z, L2pp). 
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Lemma 3. For a set Z of K excluded subscribers, 



K 



a{Z, f2p^i) = 



-Kp+j2i-^y 



K 






.P^ -P, 



P- J 
P‘^-j 






otherwise. 



(3) 



It should be noted that <7{Z, flp) given by (1) is a good approximation of 
a{Z, when approaches 0 rapidly as i increases. For Qp^ 2 , a sim- 

ilar formula can be derived. For many practical cases, the optimum choices of 
parameters imply that the cover-free family in flp^i with i = 1,2 is used. In the 
next section, we show the numerical results when the both improved and original 
KRS schemes with the polynomial construction. 

5 Numerical Results 

In this section, we show the numerical results for the proposed improvement. As 
discussed in Sect. 4.2, compared with the original KRS scheme, the complexities 
(CMl’) and (CM2’) are the same, and (CM3’) is not worse than that of the 
original KRS scheme. We focus on the results for (CM3’). The average amount of 
the necessary increase of bandwidth is decreased significantly when the number 
of excluded subscribers is often much smaller than K. If the number of excluded 
subscribers is always K, then the average amount is the same. These two cases 
are extremes. We show the results for more practical cases such that, for every 9 
with 0 < 9 < K, the probability with \X\ = 9 is the same, i.e., is equal to 7^^. 

Generally, the size of the session key, denoted £, is defined by what the session 
key is used for. Then, it is assumed that the value of I with £ > 0 is also given 
as those of N and K. We consider the case that the size of the session key is 
relatively small, as in [4,9, 10]. 

In Fig. 3, we show the average necessary increases of bandwidth for = 10^, 
K with 50 < AT < 260 and £ = 128. We compare the improved KRS scheme with 
the optimized original KRS scheme in [10] and the most efficient schemes based 
on threshold secret sharing with threshold AT -F 1 in [1, 8], called the TSS scheme 
here. The reason that we compare the KRS scheme with the TSS scheme is that 
the performance of the KRS scheme can be close to that of the most efficient 
schemes even if the latter is asymptotically more efficient than the former. In [1], 
it is shown that the asymptotic complexities of transmission for the KRS scheme 
and the TSS scheme are O(AT^) and 0{K), respectively. 

The five parameters n, p, a, k and q in the improved KRS scheme and 
the original KRS scheme are determined based on the method in [10] which 
minimizes the worst necessary increase of bandwidth in the original KRS scheme. 
The parameters in the TSS scheme are chosen properly for N, K and £. The 
precise necessary increase of bandwidth is K ■ {£+ [logiV]) bits. 

From the comparison, we see that there are the practical sizes, say {N, K, £) = 
(lO'^j 100, 128), such that the average necessary increase of bandwidth in the im- 
proved KRS scheme is 70 percents of that for the optimized original KRS scheme. 
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Fig. 3. The average necessary increase of bandwidth for the fixed N = 10, 000 and 
£ = 128. 



This means that the average amount of the necessary increase of bandwidth be- 
comes to be 6.2 times as much as that in the TSS scheme, while for the optimized 
original KRS scheme, it is 8.9 times. 



6 Conclusion 

The KRS schemes in [6] are improved. The amount of transmission is decreased 
significantly when the number of excluded subscribers is often much smaller 
than K. The amount of transmission in the worst case is the same as that in 
the original scheme. For some typical sizes of a blacklisting problem, the average 
amount of transmission is at most several times as much as the most efficient 
solutions for a blacklisting problem even if the KRS schemes are asymptotically 
less efficient than the most efficient schemes. From the results, we see that there 
are sizes that the KRS schemes are enough efficient to be of practical use. 
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Abstract. In an earlier paper, bit flipping methods for decoding low-density 
parity-check (LDPC) codes on the binary symmetric channel were adapted for 
generalised LDPC (GLDPC) codes with Hamming suh-codes. We now employ 
the analysis of this weighted hit flipping method to develop a simple soft- 
input/soft-output decoder for Hamming codes based on a conventional hard 
decision decoder. Simulation results are presented for decoding of both product 
codes and GLDPCs on the AWGN channel using the proposed decoder. At 
higher rates the indications are that good performance is possible at very low 
decoding complexity. 



1 Introduction 

In recent years, iterative coding schemes have proved successful in approaching 
channel capacity limits with low decoded bit error rates (BERs) and practical 
decoding complexity [1][5][6]. We include GLDPCs in this statement: these are 
defined by random sparse graphs with constraint nodes that relate to error-correcting 
codes [5] [8]. Simple bit flipping (BE) decoding methods for LDPCs have been 
principally investigated on the binary symmetric channel (BSC) [4] [6] [7] (but have 
also been extended to the AWGN channel [4]). 

We have successfully generalised BP methods for LDPCs to a weighted bit 
flipping (WBF) method for GLDPCs with Hamming sub-codes [2] [3]. Applying 
WBF to codes of long length, the cut-off rate of the BSC may be approached at 
relatively low decoding cost. In this paper, we extend the iterative WBF method to the 
AWGN channel. 



2 Motivation 

We first review the WBF approach for Hamming-based schemes [2] [3]. For the 
product code (PC) and GLDPC constructions employed, all symbols will belong to 
two sub-codes as shown in Fig. 1. Two interlocking (7,4,3) Hamming component 
codewords are shown with only one symbol in common. 
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Fig. 1. Example Votes from Sub-Code Decoders. 

Component hard decision decoders (HDDs) are used to generate symbol votes as 
illustrated. The decoders need only allow for two instances, 

• all-zero syndromes => vote V sent to all symbols (see sub-code 1 in Fig. 1). 

• non-zero syndromes => vote E sent to error position and vote e sent to all other 
positions (see sub-code 2 in Fig. 1 where ‘X’ indicates error position). 

Votes V, e and E are given numerical values such that the vote pairs of symbols, 
either VV, eV, ee, EV, Ee or EE, may be ranked in terms of reliability by the sum of 
their constituent votes. At each iteration the WBF technique proceeds by: i). applying 
the HDDs ii). ranking symbols by their vote pairs and iii). flipping all bits of lowest 
reliability before a new iteration begins. For extended Hamming codes an additional 
vote D is also required for all symbols of codewords where double errors are detected. 

Experimental optimisation and Bayesian analysis both point to weights of V= +1, e 
= 0 and £ = -1 as producing lowest BERs where we use larger weights to represent 
higher reliability [3]. Analysis in terms of an APP decoding strategy yields identical 
optimum weights [2]. It is this approach that will be extended. 



3 HDD-SISO Decoder 

The proposed soft-input/soft-output (SISO) decoder is based around the single 
application of a HDD and hence we will refer to it as a HDD-SISO decoder. The 
decoder is derived here for the simple case of a (7,4,3) Hamming code using the 
code’s full weight distribution. Eor larger codes the analysis may be extended using 
only the first term in the distribution. 

In the first stage of the decoder, for each symbol position j there is an ‘averaging’ 
process over the other (m- 1) LLR inputs i.e. all L(u) where /e[1,m] and i^j. An 
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average probability of bit error in the other symbol positions, p., may be approximated 
in terms of a least reliable input, as follows, 

_ 1 i 1 ^ 1 1 (1) 

^ i 1 ^ 1 I Ml “ 1 1 Irmin/., b 



where. 



n-li=i,,;t/l + exp(|L(Mj|j «-l l + exp(L™”(Mj 



Subsequently, the magnitude of an average soft input for the other («-l) positions 
may be found by representing the above BER in the form of an LLR, 

L(p .)=log—^s +log(«-l) . 

Pi 

In the second stage of the decoder, we employ analysis of an optimal SISO 
component decoder on the BSC with transition probability p to derive extrinsic LLRs 
[2]. In this case, the extrinsic LLR per symbol, L\u^, may be determined using the 
voting system of Section 2 and the following expressions. 



L"(mJ 



+ 1 for symbol vote V 
= A(p)-- 0 for symbol vote e 
- 1 for symbol vote E 



where. 



.21ogt^-log3 

3p^(i-py+4p^(i-py+p^ p 






+ 1 if received bit r.=0 
- 1 if received bit r, = 1 



In the HDD-SISO decoder, we combine the two stages as follows. Firstly, the signs 
of the input LLRs are used as the received bits in the above equation i.e. we replace 
S(u^) in (3) with sgn(L(M^)). Secondly, the average BER in the inputs not including 
position j replaces the BSC transition probability i.e. we replace log {{l-p)/p} in (3) 
with log {(l-Pj)/pj}. After making these substitutions the proposed decoder may be 
described in terms of the following equations. 



liu.) 

S[uj) 



+ 1 for symbol vote V 
= a[pj ) • • 0 for symbol vote e 
- 1 for symbol vote E 



where. 
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^(M;)=sgn(L(Mj) . 

Hence, the decoder may be implemented as follows. Firstly, the two minimum 
absolute LLRs are found at the decoder’s inputs. Next, the HDD is applied to the hard 
decision-limited inputs and the vote labels V, e and E assigned in the same manner as 
for WBF. Finally, equation (4) is employed to map the minimum LLR inputs to 
extrinsic values. For the symbol position with smallest LLR input, the second smallest 
input is employed in whereas for all other positions the smallest input is 

employed in L"'"{u^. We note that the relationship between and L‘{u^) is linear. 



4 Simulation Results 

The decoder was employed in simulations of a (64,57,4) extended Hamming product 
code on the AWGN channel. An iterative APR decoding strategy rather than a serial 
turbo decoding strategy was used. In this case, row and column extrinsic are 
determined in parallel and combined to form a single a priori input to the next 
iteration [2]. Extrinsic LLRs are attenuated by both scaling and clipping such that 
decoded BERs may be optimised. The results of Eig. 2 show that for iteration 16 
performance is close to iteration 2 of turbo decoding employing a more complex 
Chase SISO decoder. (EJNg is the ratio of energy per information bit-to-noise spectral 
density and is the decoded BER). A ‘fairer’ comparison is with the Chase decoder 
when employed in an identical APP strategy and in this case performance is very 
close. Gains over the purely hard decision WBE are around 1.3 dB. 

The same decoder has been successfully applied to (64,57,4) extended Hamming- 
based GLDPCs with rate R = 50/64 and various block lengths. In Pig. 3 we can see 
that for each GLDPC, 32 iterations of the proposed method shows a loss of = 0.85 to 
1.00 dB over 16 iterations of turbo decoding. Average gains of the APP strategy using 
the HDD-SISO decoder over hard decision WBF are = 1.35 dB. For the long block 
length GLDPCs, we highlight the steepness of the BER curves and that good 
performance is possible within approximately 1.80 dB of the appropriate Shannon 
limit. Por lower rate coding schemes based on (15,11,3) Hamming sub-codes the 
decoder is found to be much less successful. 



5 Conclusions 

Using analysis of weighted bit flipping decoding of Hamming-based codes on the 
BSC, we have developed a simple SISO decoder for Hamming codes. The operation 
of the HDD-SISO decoder can be described in simple terms: the corrective action of 
the hard decision decoder, as represented by the extrinsic LLRs, is simply weighted to 
mirror the average reliability of the decoder’s inputs using L"'"(ir). For PCs and 
GLDPCs based on high rate Hamming sub-codes, the decoder achieves good 
performance within = 2.0 dB of the Shannon limit. Importantly, the gradients of the 
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Fig. 2. Iterative Decoding of (64,57,4) Extended Hamming Product Code. Shows Turbo/Chase, 
Iterations I, 2, 4 and 6 (lines with ‘o’s) and APP/HDD-SISO, Iterations I, 8 and 16 (lines with 
‘+’i). Also uncoded (solid line) and Shannon limit for rate R = 0.793 (shaded area) 
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Fig. 3. Iterative Decoding of (64,57,4) Extended Hamming-based GLDPCs. Shows 
Turbo/Chase, Iteration 16 (lines to left) and APP/HDD-SISO, Iteration 32 (lines to right). 
Block lengths are 4096 (‘-(-’ 3 ), 8192 (‘o’i), 16384 (‘*’ 3 ), 32768 (‘x’3), 65536 (‘•’s)bits. Also 
uncoded (solid line) and Shannon limit for rate R = 0.781 (shaded area) 
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BER plots are found to be steep. The indications are that very low BERs may be 
achieved with computational cost per iteration little more than that of using a 
conventional HDD. To properly assess the usefulness of the new decoder, ongoing 
research is concentrated on performance bounds. 
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Abstract. The classic “black-box” view of cryptographic devices such as smart 
cards has been invalidated by the advent of the technique of Differential Power 
Analysis (DP A) for observing intermediate variables during normal operation 
through side-channel observations. An information-theoretic approach leads to 
optimal DPA attacks and can provide an upper bound on the rate of information 
leakage, and thus provides a sound basis for evaluating countermeasures. This 
paper presents a novel technique of random affine mappings as a DPA 
countermeasure. The technique increases the number of intermediate variables 
that must be observed before gleaning any secret information and randomly 
varies these variables on every run. This is done without duplication of the 
processing of variables, allowing very efficient DPA resistant cipher 
implementations where the ciphers are designed to minimise overheads. A real- 
world system has been developed within the tight computational constraints of 
a smart card to exhibit first-order DPA-resistance for all key processing. 



1 Introduction 

Differential Power Analysis (DPA) is a major new threat to the secrecy of data 
processed in secure tokens such as smart cards. A thorough understanding of the 
attack and how to defend against it needs a formal approach, for which an 
information-theoretic approach provides a solid basis. 

This paper illustrates this approach and presents a solution useful in this context 
primarily for cryptographic applications and certain types of processing, including 
some common operations including data storage. 



1.1 What Is DPA? 

The classic “black box” cryptographic security assumption on the processing of secret 
data in purpose-built device, such as a tamper-resistant smart card, has been 
invalidated with the advent of Differential Power Analysis (DPA). The technique of 
DPA exploits the fact that electronic hardware leaks information via side-channels as 
depicted in Fig. 1, from which secret data may be extracted with little effort. Some 
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other attacks exist that challenge the black box assumption, such as Differential Fault 
Analysis, but none are so effective or difficult to defend against as DPA. 

In the example of analysing secret data being processed by a smart card, DPA 
generally seeks to extract the tiny data-dependent differences in observable 
waveforms (such as the current drawn by a device, emitted and conducted radiation), 
and to relate these to the internal values being processed. 




Side- 

channel 



Z 

x: input data 
k: secret key 
y: output data 
z: leaked signal 



Fig. 1. A cryptographic operation with side-channel information 

Cryptographic algorithms lend themselves to such analysis because their intermediate 
variables are usually not correlated to anything (they tend to display “ideal” 
randomness) except a correct hypothesis on the values in the calculation. It follows 
that valid hypotheses on internal values may be verified by finding such a correlation. 
This rather general approach (due to Kocher [2]) identifies a correlation with 
intermediate variables predicted by hypotheses on small numbers of bits of the 
cryptographic key. An attacker who has the specification of a cryptographic 
algorithm typically has a good chance of selecting an intermediate variable that has a 
physical analogue in an implementation. The attacker can locate a point in a sequence 
of waveforms of the current consumed by a smart card that exhibits a correlation with 
an intermediate variable deduced from a correct hypothesis on the key and certain 
known data (usually ciphertext). Often a few hundred known ciphertexts and 
associated side-channel observations are sufficient to determine a cryptographic key. 

This approach is frequently illustrated in the context of the well-known DBS 
algorithm. In the first round of processing, where the input data to the cipher is 
known (or the last round, where the output is known) for a sequence of differing data 
inputs but using the same key, each output bit of each lookup table (s-box) may be 
used for this purpose. Each bit depends on the changing input (output) and only six 
bits of the key. For each of the 64 possible combinations of these six bits (these being 
the hypotheses from which we wish to determine the correct one), the lookup table 
output bit may be determined. Using the bit value calculated from the data and the 
hypothesised key bits, we can classify a set of say a thousand observed current 
waveforms into two sets, and find the difference of the average of each of the 
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resulting two sets of waveforms. We will then have 64 “differential” waveforms, 
which we can search for peaks, which result from the correlation mentioned and 
correspond to the point in time when the lookup table output bit of interest is being 
processed. Generally, one of the 64 differential waveforms will have significantly 
larger peaks than the others have. Repeating this for one output bit of each lookup 
table allows determination of the 48 bits of the key used during the first (last) round. 
Using the known values of these bits, the remaining 8 bits of the key may be found by 
extending the analysis to the next round. Many authors suggest that these bits may be 
found using a brute force attack, but this requires a known plaintext/ciphertext pair. 



1.2 Information Leakage Rate 

Although Kocher’s style of attack is impressive in its effectiveness, it is not the most 
sensitive DPA attack possible, and hence is not suited to determining the effectiveness 
of a countermeasure. In particular, an approach of optimal detection of a signal in 
noise will extract the maximum information theoretically available from the 
observation, under the assumption that the attacker has full knowledge of the device 
characteristics excluding the secret information of interest. 

An information-theoretic approach gives a sound basis for determining the 
maximum amount of information that can leak about a datum from a circuit using a 
realistic model of the way in which observable waveforms are determined by internal 
activities and noise. Suresh et al [4] suggests such an approach, which can be 
rigorously formalised. Without needing to put accurate figures to the leakage process, 
we can determine how certain strategies affect the upper bound on the information 
leaked about a cryptographic key or other data. Strategies that reduce this bound to 
acceptable levels (e.g. to a defined fraction of a secret key during the life of the key) 
can then be found. 

Both the information-theoretic model and DPA analyses on standard hardware 
performed by us and others show that it is disconcertingly easy to extract data from a 
typical smart card using simple equipment and without operating the smart card in a 
fashion that it could detect as abnormal. 

If we need to observe several intermediate variables to assemble any information 
about the original data and each of these variables has maximal entropy, the amount 
of observation needed to reduce the uncertainty in the original data increases with the 
power of the number of variables that must be observed. Thus if we need 100 noisy 
observations to obtain enough information on a physical intermediate variable, we 
would need 1 000 000 similar observations if we are forced to observe a minimum of 
three distinct intermediate variables to obtain the same information. 



1.3 The Technique 

The technique presented here comprises applying changing random affine mappings 
to the data being operated upon, thus making internal variables unpredictable 
(“mapped” or “obscured”). The operators themselves generally remain unchanged. 
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but the content of lookup tables must be replaced to operate on the mapped data. The 
concept of random affine mappings is in itself not new. The well-known idea of a 
random affine constant (e.g. x’ = x + b) for data hiding as defined in relation to an 
operator is extended here to include an invertible linear constant (e.g. x’ = ax + b). 
This extension applies to essentially any operator to which the concept of “linearity” 
can be extended. Thus, finite field addition and multiplication both are clear 
candidates. The linearity in relation to a multiplicative operator generally results in 
an expression of the form x’ = bx“, and in relation to the exponentiation operator is as 
for multiplication for the base and appears to be restricted to multiplication for the 
exponent. (For RSA exponentiation, it may be more effective to express the exponent 
as an arbitrary sum of two shares due to the need to avoid use of the applicable 
modulus, (p{n)). The mapping is chosen such that the desired result can still be 
extracted knowing the mappings that apply to the input data. The exclusive-or 
operator appears to allow more entropy in the selection of the mapping for a given 
size of operand than any other operator does - 70.2 bits of random freedom when 
mapping an octet (mostly due to a random linear matrix A in the expression 
x’=Ax + c). In contrast, modulo-256 addition allows only 15 bits of freedom (the 
constant a being restricted to odd values). There is merit in having large entropy of 
mapping selection when many variables are to be hidden using the same or a closely 
related mapping (as may be useful, e.g. to avoid re-mapping of data). 

Upon each use (at times chosen by design), the random mappings on the data are 
replaced, removing any fixed target data to observe, but in a way that retains the 
relations in the mapped data that allow reconstruction of the underlying data. 
Mappings are applied and removed at the input and output only - e.g. to incoming 
ciphertext to be deciphered and to outgoing ciphertext to be transmitted, and internal 
plaintext that need not remain secret but must be processed. Cryptographic keys are 
at no point represented in their original form in the field (even internally and when 
downloaded), and only a difference mapping is applied to the key data. 

This technique substantially reduces the effectiveness of DPA. With care in the 
choice of the computational primitives, the added computational and storage costs can 
be remarkably low. The technique has been demonstrated to be effective, practical 
and economic. 

As is to be expected, the technique is not effective for hardware that is shielded so 
poorly that intermediate variables may be determined from a single waveform. 

Data most readily protected using this technique are cryptographic keys and data 
for storage. One significant advantage of the approach is that there is no system-level 
impact (such is the introduction of message sequence numbering). 



2 Discussion of Information- Theoretic Analysis 

This section illustrates the information-theoretic approach that can be used to evaluate 
the effectiveness of a defence against any DPA attack based on a mathematical model 
of signal leakage. 
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2.1 Defining Information Leakage 

We wish to define an accurate measure of information leakage for purposes of 
determining the vulnerability of any device to a DPA attack. Measurements made on 
real devices can allow quantification of the amount of information potentially leaked 
by those devices, and in particular, the gains provided by a countermeasure can be 
determined. 

For the purposes of this topic, the information leaked about a key by an 
observation will be defined in the information-theoretic sense. We define it as the 
reduction in entropy of the key when given the observation. The entropy of a variable 
is a measure of how much is not yet known about it - it is the amount of information 
required to fully specify it from what is already known. This definition ignores any 
complexity barriers that the attacker may face, and it is appropriate when trying to 
establish an upper limit to the information obtainable from side-channels from the 
device. 

The entropy // of a key (in bits) may be expressed as follows. The expression uses 
the probability P. that the attacker would assign to the i possible value of the key 
with all the knowledge at his disposal including his prior knowledge and all 
observations. This is a standard and well-established definition of entropy: 

( 1 ) 

i 



2.2 A Formal Model of Leakage 

A formal model of information leakage requires an accurate description of the 
relationship of the internal digital state of the device and the possible observed 
waveforms. For each of the possible internal states (by which is meant a distinct 
possible sequence of all internal variables), there will be an associated ensemble of 
waveforms due to added noise. An ensemble is essentially a set with an associated 
probability or probability density for each element of the set. When the waveform is 
in the form of a sequence of n samples. The ensemble may be mathematically 
described as joint probability density function in an n-dimensional state space, with 
each sample determining one of the n coordinates. There is one such joint probability 
density function associated with each possible internal state of the device. 

A waveform observed by the attacker determines a set of coordinates in the state 
space, from which a probability density value may be determined for each possible 
internal state. The probabilities of each of the possible internal states given the 
observation are the same, normalised to sum to unity. These probabilities will then be 
modified by subsequent observed waveforms (by multiplying with the density values 
and scaling), producing an overall probability of each possible internal state. 
Summing the probabilities of all the internal states associated with a particular value 
of a key gives the probability of that value of the key. An attacker need only observe 
as many waveforms as is necessary to make the probability that the key takes on one 
value approach certainty. 
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In evaluating the vulnerability of a device to DPA, we must determine the average 
reduction in the entropy in such an attack. For this, we must find the ensemble 
average (i.e. weighted by the probability of each possible set of waveforms) of the 
information leaked (the reduction in entropy) over all the possible observed 
waveforms. 

It must be stressed here that since this is a general measure of information leakage, 
this model assesses the vulnerability of a device to the optimum DPA attack (i.e. the 
order of the attack is immaterial). 

Since these states are distinguished by a large amount of data unknown to the 
attacker (such as key values and internally generated random data) and each joint 
probability function has enormous dimensionality, this description as it stands would 
appear to be intractable. Yet it serves as a rigorous starting point. From this the 
necessary understanding may be derived, and realistic simplifying assumptions may 
be added to make the analysis more tractable. 

2.3 A Realistic Simplified Model to Aid a DPA Attack 

For the purposes of analysis, we can make a number of realistic simplifying 
assumptions. Foremost amongst these is the assumption that the noise is wide-hand 
(spectrally white), has a normal distribution and is additive. Even when the noise 
does not have a flat spectrum (or equivalently that the noise contribution at different 
points in a waveform are not correlated), this may generally be arranged by means of 
a suitable filter (the noise probability density function at each point will remain a 
normal distribution). A further simplifying assumption is that the noise is stationary 
(of constant amplitude), and this too can be arranged by pre-processing if necessary. 

Under these assumptions, standard matched filtering may be used to correctly 
determine the probability of each possible internal state. In this method, the expected 
(ensemble average) waveform for each internal state is multiplied with the observed 
waveform and integrated. Finding the mean square error is equivalent. The resultant 
set of scalar values (one value for each internal state) can be used to update the 
probability associated with all the internal states. 

This approach remains intractable when considering all possible key values when 
the number of bits in the key is in a normal range (e.g. 56 bits for DES). However, 
this approach does suggest a viable attack on small portions of the key as they become 
incorporated into the processing. In the interests of simplifying an attack, the 
waveforms may be processed in small portions, at each step refining the probability 
estimates of initially only a few bits of the key. Only the most probable values of the 
bits in question need to be considered, since the option of backtracking exists when it 
appears that incorrect choices have been made. This approach is very similar to a 
standard signal decoding technique. 

It may be noted at this point that unknown random data may influence the 
processing sequence of the processor, and is part of the internal state. These data 
would have to be determined as well as the key bits. Yet this fits neatly into the 
process of progressively refining the estimation of the internal state of the device. 
Where there is some ambiguity in this data, it will be reflected in several internal 
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states having moderate probability. This type of data however typically has the most 
visible effect on a waveform, and it is difficult to disguise it in one waveform. 

At the cost of sensitivity of the technique, the contribution of all digital processes 
not considered may be included in the effective noise. In a real attack, a device may 
be characterised while it is processing known data with known keys, which may have 
been determined by other means. This will provide a good estimate of the waveforms 
to be used in the matched filtering. This reduction in sensitivity is often tolerable, 
while the simplification is well worthwhile. 

A point being made here is that the information-theoretic model suggests a DPA 
attack that is viable and has near-optimal sensitivity). The information-theoretic 
bound on the information leakage is therefore appropriate for evaluating DPA 
countermeasures. 

2.4 Determining the Information Leakage 

The attacker can accurately determine, in principle and in practice, the waveforms 
generated by the possible digital processes and the probability density function 
associated with the waveform samples. Unrelated digital processes in the device add 
considerably to the variation in the observed signal, but as this contribution is (at least 
in principle) predictable, it cannot be relied upon to add to the unpredictability of the 
observation. 

It is assumed that the unpredictable component of the waveform (the additive 
noise) is large relative to the portion of the signal that allows differentiation between 
distinct waveforms. If this were not the case, the data being processed would be 
deducible from a single waveform. The applicable criterion here is the energy in the 
differential signal divided by the noise power spectral density rather than simple 
amplitudes. 

Prior to any observations, we assume that the entropy of the key equals the number 
of bits in the key. The entropy of the key prior to an observation is calculated from 
the a-priori probabilities of each possible value of the key. The probabilities, as 
modified by the observation, are the a-postiori probabilities of the key for the 
observation. We treat the a-postiori probabilities for one observation as the a-priori 
probabilities of the next observation. Provided the observations are not correlated 
other than through the digital process being modelled, the a-postiori probabilities are 
the a-priori probabilities multiplied by the corresponding probability densities of the 
observation after normalisation. 

The information leakage due to a set of observations (or equivalently the reduction 
in the entropy) is the ensemble-average (weighted by the probability density of the 
waveforms given the a-priori probabilities of the values of the key) over all 
waveforms of the reduction in entropy. The magnitude of this leakage can be used to 
rate the effectiveness of a strategy as a DPA countermeasure. 

2.5 Information Leakage: An Example 

Two simple information leakage scenarios of DPA leakage are presented in Table 1. 
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Table 1. An example of reducing information leakage 



A: DPA on one bit 



B: DP A on data hiding of one bit 



Secret data x =x^=x^ 

Noise amplitude <7 =1 

tr noise 

Signal amplitudes c, = c^ =1 

A-priori probabilities P(x=0) = P{x=\) = 0.5. 

Entropy H(x) = log^O.S = 1 bit. 

Observed samples and y^. 



Secret data jc = aCj © 

Noise amplitude = 1 
Signal amplitudes c^ = c^ =1 
A-priori probabilities P(x=0) = P{x=l) = 0.5 
Entropy H{x) = log^O.5 = 1 bit. 

Observed samples y, and y,. 



The explicit form of the distributions is: 



p{y^,y^\x = Q) 
1 



ri. 



.no 



p{y^,y^\x = \) 

1 



(yi-q) +(^2-^2) 
2 <t^ 



(ji+q) +(y2+<^2) 
2<7^ 



The explicit form of the distributions is: 

p(yi,y2 1 = 0,^2 = 0 ) 

1 



2na 

1^1 = 0,^2 = 1 ) 



1 



(Jl-Cl)^ + (J’2+C2)^ 



s. 



no 



2na 



P(yi2T2 1^1 =12-^2 = 0) 
1 



2na 

P(yi2T2 1^1 =12-^2 =1) 



1 

Ina'^ 



(ji+q) +(j2+g2) 
20-^ 





Average contribution by (y^, y^) to leakage 
l(x-,y„y,)p(y„y,) 



Average contribution by (y^, y^ to leakage 
I(x;y„y,)p(y„y,) 



H{x) = 1 bit 
R(x) = 0.75 bits 



H(x) = 1 bit 
R{x) = 0.25 bits 



With reference to these scenarios, equation 7(x; y^, expresses the reduction in the 
entropy of x obtained by observing y^ and y^. 
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I{x\y^,y^) = H{x)-H{x\y^,y^) ( 2 ) 

Let R(x) denote the mean reduction in the entropy of x. In order to determine R(x), 
the above function must be multiplied by^ p(y^, y^) and the result must be integrated 
over all (yj, y^) - obviously using the definite integral, as shown here. 

R(x) = JJ p(yi , y2)I(x-, , 3^2 )dyidy^ 0) 

In attempting to combat DPA attacks, the objective would clearly be to minimise R(x). 
In the case of the two scenarios, it is evident that Scenario B (which uses data hiding) 
leaks relatively little information (0.25 bits) after two observations. In contrast. 
Scenario A leaks 0.75 bits of information, with the same number of samples (we 
could have used one sample in Scenario A, which would have led to a value closer to 
0.5 bits). It can be shown that for low leakage, the leakage rate is proportional to 
SNR‘ for a A:-share exclusive-or mechanism, where SNR is the ratio of the signal 
power to noise power in the samples of the example. As the SNR is reduced, the 
information leakage rate of Scenario A reduces roughly linearly, whereas that of 
Scenario B reduces roughly quadratically and is hence substantially less at low SNR. 



3 Simple Defences 

A number of defenses have been proposed to enhance DPA-resistance. These include 
measures such as making the device draw a constant current, using shielding, 
randomly varying the processing time of any particular intermediate variable, adding 
random noise to the observable waveforms, and reducing amplitude the signal 
generated by the hardware by careful design. 

These defences can have no greater effect that to reduce the effective signal-to- 
noise ratio (SNR, defined in relation to power or energy) of the observed waveforms. 
Even combined together in a device such as a smart card, they do no more than make 
the attacker’s task more difficult (each multiplying the number of waveforms the 
attacker needs by a small number), and should not be considered as a final solution. 
Computational defences are needed for effective defense against DPA, as suggested 
in [5]. 

Such defences (those that effectively reduce the signal-to-noise ratio) have value 
when combined with computational techniques. The computational techniques 
effectively put the signal-to-noise ratio to a small power, e.g. squaring it, along with 
making the attack more complex (e.g. by making any first-order attack such as 



^ Since the joint observations (y,, occur with varying degrees of probability, a joint 
probability is required. 
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Kocher’s general attack non- viable). A first-order DPA attack has this distinguishing 
characteristic - any number of waveforms may be averaged in each of the groups into 
which they are divided, and its effectiveness climbs in relation to the effective 
improvement in signal-to-noise ratio produced by this averaging. 



4 The Proposed Technique 

The technique proposed here provides a practical and effective modification of 
cryptographic processes, based on data encryption through frequent varying of the 
mapping of all data being processed, onto an obscured form for computation and 
storage. Examples of such data are cryptographic keys, stored and communicated 
data. 

Both the mapped (obscured) data and the chosen mapping must be partially known 
before any information about the secret data becomes accessible. 

Secret data, such as cryptographic keys, are never needed in the non-obscured 
form, and should be randomly re-mapped on each use to avoid data repetition that 
would facilitate a DPA-attack. An example of when this technique will have high 
value is in smart cards, where DPA can provide an unauthorised party with a 
cryptographic key in use within minutes, entirely though analysis of the leaked 
signals. 

The method includes the following steps: 

• Design of algorithms, particularly ciphers, for maximum benefit from this 
technique; 

• Modifying the algorithm implementation to operate on mapped data; 

• Initial mapping of data (especially cryptographic keys) for storage; 

• Frequently changing the data mapping from the prior data mapping by use of a 
secondary (or delta) mapping; 

• Mapping incoming data for input to the modified algorithm implementation; and 

• Mapping of data output from the modified algorithm for further use. 



4.1 Cipher Design 

Care must be exercised in choice of a cipher algorithm. Suitable cipher design can 
result in the next step (cipher modification) adding very little processing overhead. 
Choosing the set of operations that are used in the cipher is important for minimising 
complexity and maximising data hiding. Understanding of the following aspects of 
the technique is essential during the design. 




54 



M. von Willich 



4.2 Cipher Modification 

A different mapping may be used for every operation throughout the cipher, thus 
necessitating a change of the mapping on every data path. Alternatively, the mapping 
may be left unchanged between two operations. The latter is typically not possible 
when the two operations are unrelated, hut when possible is useful in keeping the 
degree of complexity low. 

Every operation is substituted with one that performs the equivalent operation with 
all values being mapped, as illustrated in Fig. 2. The output mapping (fj is 
determined by the input mappings (f^ and/^) and any changes to the core operation. 
For example, adding a random value to each of the inputs of an addition is reversed 
on the output hy subtracting the sum of the random values from the output. 

The non-obscured values (a, b and c) still occur in Fig. 2, and at this point are only 
obscured during each individual operation, but are removed in the next step. The 
operation performed on the mapped values will normally he the same operation as 
before (e.g. addition), but may be different when the operator does not have the 
necessary properties (e.g. an arbitrary lookup table). 




Fig. 2. Replacement of a two-input operation with a data-hiding equivalent 



The next step is to combine consecutive mappings from adjacent operations into a 
single mapping that does not, even as an intermediate calculation value, derive the 
non-obscured data. Occurrence of the non-obscured data would provide a primary 
target for a DPA attack. Where the consecutive mappings are closely related the 
composite mapping may be somewhat simpler or even become the identity operation 
(and hence can be omitted, as is preferable). This is illustrated in Fig. 3. 
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data value not 
obscured by a 
mapping 





Fig. 3. Combining of consecutive mappings 

If necessary, the combined mapping is implemented using a look-up table. If 
cascaded lookup tables occur in this process, they may be combined into one lookup 
table. After this step, aside from the input data, key-data and output data, the data in 
all computations are obscured by the mappings. These external mappings are treated 
separately in subsequent steps. 

Special care must be taken with lookup tables. Where there may be a non-linear 
interaction (in the analogue sense) between the input and output, certain “parasitic” 
intermediate variables may be produced. An example is in a microprocessor that has 
a multiplexed bus such as the 805 1 derivatives, where the low-order eight address bits 
on the single bus are replaced by the looked-up value. A transition on a bit of the bus 
at this point results in a pulse of charging current that is the exclusive-or combination 
of the two values. For this reason, the input and output of the lookup table should at 
minimum be masked with unrelated affine constants. A second point to note is that a 
lookup table is used many times before it is re-mapped, there may be a second-order 
DPA attack based on the structure of the table itself leaking information about the 
mappings used. This will not normally be significant compared to the information 
leaked about the mappings in the input and output mappings, but where these 
mappings are cascaded to provide protection against a higher-order DPA attack, this 
leakage must be considered. 

With careful choice of cipher design and restrictions on the chosen mappings, the 
resulting complexity need not be much greater than that of the original cipher. 
Additionally, computation relating to the mapping used in each computation may be 
kept to minimum. The resulting mathematically equivalent cipher is shown in Fig. 4. 
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Fig. 4. Replacement of a cipher by its modified equivalent 



4.3 Initial Storage of Keys 

In Fig. 4, the key, input data and output data are still shown in non-hidden form, and 
may still he target of a DPA attack when they are read and written. The cryptographic 
key must be stored in an obscured form where the choice of mapping includes 
randomness. Additionally, the information encoding the choice of mapping must be 
stored. This initial step is only needed once with initial or master keys downloaded 
(typically in a protected environment), and never for keys downloaded using 
encrypted messages.^ This may be expressed as storing the key k in its secret form 
=f^k), as well as information identifying the choice of mapping,/^. This mapping 
will most commonly be chosen in relation to the operators used in the cipher in which 
the key is used to avoid unnecessary re-mapping. Refer to Fig. 5. 



non-obscured 




obscured 



mapped values 
to be stored 



Fig. 5. Initial mapping of the key for storage 



^ Refer to the Section 4.6: Cipher output data mapping. 
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4.4 Per-Use Key Mapping 

Even stored in obscured form as described in Section 4.3, repeated retrieval would 
allow both the secret data and the mapping information to be reconstructed through 
DPA techniques. Consequently, prior to using a cryptographic key, the mapping must 
be replaced with a fresh, randomly chosen mapping subject to the constraints imposed 
by the cipher - preferably every time the key is to be used. It is important that the 
original value of the key should not be computed during this process, even as a 
temporary variable. This leads to a derivation of values in the form k = g (k ) and 
f = g «/^. The latter is equivalent to saying /(^) = g(f,^{q)) for any q. The values k 
and / will replace the stored values k ^ and/ . They remain related by the identity 
k =f(k). This process is illustrated in Fig. 6. 




Fig. 6. Iterative mapping of a key 



4.5 Cipher Input Data Mapping 

The input data (x in Fig. 4) is first mapped using the mapping chosen for those inputs. 
This is analogous to the initial mapping of the key (as in Section 4.3), but may occur 
with all data to be processed, such as received ciphertext to be decrypted, or plaintext 
to be encrypted for transmission. Where sensitive information (e.g. keys) is to be 
encrypted, it must already be in secret form and a mapping substitution should be 
performed where appropriate (as in Section 4.4). 



4.6 Cipher Output Data Mapping 

The output may be mapped to its non-obscured value where it is not critical to hide its 
value (e.g., where ciphertext to be transmitted has been generated). Where this data 
must remain hidden (e.g. secret data, especially received cryptographic keys being 
downloaded, etc.), it should be stored without mapping it back to the non-obscured 
form, along with the mapping information. This mapping information must 
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correspond to the required form for use with a key. Thus, the initial mapping of the 
key mentioned in Section 4.3 does not occur explicitly with received and decrypted 
keys. This makes the process of downloading keys resistant to DPA. 



4.7 Higher-Order DPA Defences 

Where the mapping data is used only externally to the cipher (as can be the case with 
careful design), multiple independent mappings may be applied in cascade (one after 
the other). This diminishes the usefulness of observations of the random mapping 
data used. Thus, an attack exploiting the mapping process may easily be defended 
against to ant desired degree. 

This does not ensure that the aggregate mapping data cannot be obtained from the 
ciphering operation. Such information may be obtained from the structure of mapped 
lookup tables, for example. Cascading lookup tables does not alleviate this problem, 
as only the input and the output of the cascade are needed to determine the structure 
of the cascade and hence the cascaded mapping. 

Care must be taken to avoid low-order information leakage, for example where 
multiple vectors are mapped with the same mappings. This is most significant where 
key data and input data share the same mapping - something best avoided. A simple 
example is finding a correlation between mapped input data values and mapped key 
data processed in the same data paths in a second-order attack. 



5 A Practical Design 

A proprietary real-world cryptographic cipher was designed within tight 
computational constraints to exhibit first-order DPA-resistance using the insights and 
techniques presented here. There was no computational overhead due to the affine 
mapping in the cipher itself, while mapping of input and output data added a small 
percentage to the overall processing time. A non-prohibitive fixed preparation 
overhead was also needed for random number generation and pre-calculation. 
Despite the restrictive design criteria, the resultant cipher has survived classic linear 
and differential cryptanalysis with flying colours. 

As the most promising operator, an 8-bit exclusive-or was combined with 8-bit 
lookup tables for a practical cipher design. Input keys are already randomly pre- 
mapped on a per-octet basis for storage, whereas incoming ciphertext and plaintext 
are mapped prior to processing in the cipher proper. Lookup tables must be 
recalculated according to the applicable mappings applied. Every octet x is replaced 
in the real computation by its mapped equivalent x’ = Ax + b. The 8-by-8 matrix A is 
a randomly chosen non-singular matrix used throughout the decryption or encryption 
of one ciphertext, and b is one of many random 8-bit constants chosen independently 
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where practical for mapping different data being processed in the cipher. The matrix 
A is attractive in that it introduces 60.2 bits of randomness into the mapping, which 
allows each A to be used in more than one mapping. 

Economy of design was crucial, and the design was thus constrained so that only 
one lookup table was to be used and the random mapping data was not referenced at 
all during the execution of the cipher. The result was a cryptographically strong 
cipher that executed at the same speed whether random mapping data was to be used 
or not, excluding the pre-and post-calculation relating to the random mappings. 

The matrix multiplication of the calculation x’ =Ax + b may be implemented as a 
matrix multiplication or as a lookup table. Similarly, determination of the inverse 
matrix may be done by matrix inversion or by generating an inverting lookup table. 
As the amount of data to be mapped is small, we found both matrix multiplication and 
inversion to be more economic in our implementation. 

A DPA attack was mounted against the design, with full knowledge of the 
implementation and exactly when every datum is being processed. The attack is 
based on averaged waveforms, and is hence a true first-order DPA attack. This attack 
is successful against the implementation when the mapping randomness is disabled 
with fewer than 10 observed waveforms used to produce each averaged waveform. 
The same attack is unsuccessful when the mapping randomness is enabled with 10“* 
observed waveforms used to produce each averaged waveform. 



6 An Example: Making an XOR-Based Cipher DPA-Resistant 

In this example, a simplistic cipher is used constructed entirely from modulo-2 
addition of octets (vectors of eight bits each) and a single lookup table that produces 
an 8-bit output value for each 8-bit input value. A single mapping is used to obscure 
every data octet in this example, of the form . = Afl^ -i- b.. 

is a typical octet of the data set d = (d„, d^, ...), which may be the key, input and 
output of the cipher; A. is a randomly selected non-singular 8-by-8 matrix of bits; and 
b^ is a randomly selected octet. The subscript n indicates a selected octet of the data 
set. 

In Fig. 7, these operations have been combined to illustrate the example. A typical 
cryptographic cipher (encryption or decryption) would use many more operations and 
the data sizes of k, x and y would generally each be at least 64 bits. Each arrow 
represents the flow of one octet. The diagram shows equivalent operations with 
mapping of the data, but does not show the incremental mapping of the key (described 
in Section 4.4). 
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Fig. 7. Simplistic cipher illustrating the mapping process (excluding per-use key mapping) 



The initially mapped key + b„ and the mapping/;, = (A„, are stored. Prior 

to any use of the key, a fresh mapping is performed by choosing new G. and h.. 
Following this, is replaced by A:^ , = + /t,. A,, by A, = GA,, and by 

b. = Gb. ^ + h.. This has been omitted from Fig. 7 for simplicity. 

Every lookup-table, s, is replaced by its equivalent s. for operation on mapped 
values, defined by s.(z) = A j(A,‘(z + b)) + b.. (It is not necessary to generate the 
inverse of the matrix when generating s..) The input data octets, are then mapped 

using the same mapping, , = A_x„ + b.. The input is ciphered using the original 

cipher, except for the substituted lookup table. Aside from the per-key mapping, the 
substituted lookup table, the initial mapping and final mapping, there is no change to 
the computation involved in the cipher. 

Finally, where the output, y, is to remain secret, such as with the value of a key, y,, 
A. and b. are used instead of y. If it is to be mapped into its non-obscured state, this 
may be expressed as y^ = Al'y^. + b.. 

A pivotal observation to be made is that due to the large number (2™ ^) of possible 
mappings, the same mapping can be used for hiding more than one octet of data 
effectively. This allows the modified cipher to remain simple. A simpler mapping 
may not hide multiple bytes adequately against DPA. 
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Since the mapping (A/, bi) and the mapped data d- are changed on every use, the 
processed data (including the key) is not correlated with the original (non-obscured) 
data. Only a moderately complex function of several bits of data and the mapping is 
correlated to the original data. Each bit of the original data can be expressed as a 
function of the 17 bits being processed, making attacks complex. The matrix A does 
not in itself increase the order of a DPA-attack - it merely makes it more complex and 
less sensitive. This may be seen from the fact that there is a correlation between the 
hamming weight before and after multiplication by A, and that the zero vector in 
particular remains unchanged. On its own, it does not rule out a first-order DPA 
attack. 

This example, except applied to the design of a cryptographically strong cipher, 
may be used effectively in smart cards available today, including those that use 8-bit 
processors and modest quantities of storage space. 



7 Conclusions 

This is a practical defence against DPA known to the author that needs neither a 
system-level design change (such as key-use sequence numbering) nor a doubling of 
processing cost (such as a share-based mechanism), although it is best introduced in 
association with cipher redesign. A practical smart card implementation of a strong 
algorithm for a real system has withstood determined and sensitive searches for first- 
order DPA vulnerabilities. 

The technique may be applied to existing ciphers (such as DES and IDEA). DES, 
due to the need to use mapping data extensively throughout the modified cipher, 
results in something similar to the “duplication method” of Louis Goublin et al. 
IDEA, due to the way in which it combines incompatible operators, needs insertion of 
several pre-calculated re-mapping tables, but would be practical and necessary in 
combating emitted-RF DPA attacks on PC-based ciphering. The structure of the AES 
winner, Rijndael, has a pleasing structure for application of this technique. 
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Abstract. This paper is concerned with the design of cryptographic 
APIs (Application Program Interfaces), and in particular with the part 
of such APIs concerned with computing Message Authentication Codes 
(MACs). In some cases it is necessary for the cryptographic API to offer 
the means to ‘part-compute’ a MAC, i.e. perform the MAC calculation 
for a portion of a data string. In such cases it is necessary for the API 
to input and output ‘chaining variables’. As we show in this paper, 
such chaining variables need very careful handling lest they increase the 
possibility of MAC key compromise. In particular, chaining variables 
should always be output in encrypted form; moreover the encryption 
should operate so that re-occurrence of the same chaining variable will 
not be evident from the ciphertext. 

Keywords: Message Authentication Code, cryptographic API, crypt- 
analysis 



1 Introduction 

MACs, i.e. Message Authentication Codes, are a widely used method for pro- 
tecting the integrity and guaranteeing the origin of transmitted messages and 
stored files. To use a MAC scheme it is necessary for the sender and recipient of 
a message (or the creator and verifier of a stored file) to share a secret key K, 
chosen from some (large) keyspace. The data string to be protected, D say, is 
input to a MAC function /, along with the secret key K, and the output is the 
MAC. We write 

MAG = fK{D). 

The MAC is then sent or stored with the message, i.e. the string which is trans- 
mitted or stored is D\\fK{D) where x\\y denotes the concatenation of data items 
X and y. 

* The views expressed in this paper are personal to the author and not necessarily 
those of Visa International 
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In this paper we are concerned with a particular class of cryptographic APIs, 
namely those providing access to the functions of a cryptographic module. Such 
modules will typically provide a variety of cryptographic operations in conjunc- 
tion with secure key storage, all within a physically secure sub-system. We more- 
over assume that the API is designed so that use may be made of the crypto- 
graphic functions without access being given to internally stored keys. We are 
concerned here with attacks which may, in certain circumstances, enable a user 
with temporary access to the cryptographic API to use its functions to discover 
an internally stored key. 



2 APIs, MACs, and Chaining Variables 

Most cryptographic modules have relatively limited amounts of internal memory. 
They also typically require the data they process to be passed as parameters in 
an API procedure call, since they typically will not have the means to directly 
access memory in their host system. As a result they can normally only compute 
a MAC on a data string of a certain maximum length. Thus provisions are often 
made in the cryptographic API for the module to compute a ‘part MAC’ on a 
portion of a data string. Computation of a MAC on the complete data string 
will then require several calls to the module. 

Given that it is always desirable to minimise the stored state within the mod- 
ule, it is therefore necessary to provide the means to input/output ‘partial MAC 
computation state information’ to/from the module. This is normally achieved 
by providing for the input /output of ‘Chaining variables’ within the MAC pro- 
cessing procedure calls of the API. That is, it should be possible to input and 
output the value of Hi for a certain i, as defined in Section 3 below. 

Based on this idea, we can identify four different types of MAC computation 
call to a cryptographic module: 

— Type A: (‘all’) where the entire data string to be MACed is passed in a single 
call and a MAC is output (no chaining variables are input or output), 

— Type B: (‘beginning’) where the first part of a data string is passed to the 
module (and a chaining variable is output but not input), 

— Type M: (‘middle’) where the central part of a data string is passed to the 
module (and chaining variables are input and output), and 

— Type E: (‘end ’) where the last part of a data string is passed to the module 
(and a chaining variable is input but not output, and a MAC is output). 

3 On the Computation of MACs 

MACs are most commonly computed using a block cipher in a scheme known as 
a CBC-MAC (for Cipher Block Chaining MAC) . There are a number of variants 
on the basic CBC-MAC idea, although the following general model (see [1,2]) 
covers most of these variants. 
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The computation of a MAC on a data string D, assumed to be a string of 
bits, using a block cipher with block length n, is performed with the following 
steps. 

1. Padding and splitting. The data string D is subjected to a padding process, 
involving the addition of bits to D, the output of which (the padded string) 
is a bit string of length an integer multiple of n (say qn). The padded string 
is divided (or ‘split’) into a series of n-bit blocks, Di, D 2 , ■ ■ ■ , Dq. 

2. Initial transformation. An Initial transformation I, possibly key-controlled, 
is applied to D\ to yield the first chaining variable Hi, i.e. H\ = I{Di). 

3. Iteration. Chaining variables are computed as Hi = exiDi © for i 

successively equal to 2, 3, . . . , g, where AT is a block cipher key, and where © 
denotes bit-wise exclusive-or of n-bit blocks. 

4. Output transformation. The n-bit Output block G = g{Hq), where g is the 
output transformation (which may optionally be key-controlled). 

5. Truncation. The MAC of m bits is set equal to the leftmost m bits of G. 

The relevant international standard, namely ISO/IEC 9797-1, [1], contains six 
different CBC-MAC variants. Four are based on combinations of two Initial and 
three Output transformations. The various Initial and Output transformations 
have been introduced to avoid a series of attacks possible against the ‘original’ 
CBC-MAC (using Initial transformation 1 and Output transformation 1). 

— Initial transformation 1 is: I{Di) = e/c(Oi) where K is the key used in the 
Iteration step, i.e. it is the same as the Iteration step. 

— Initial transformation 2 is: I{Di) = eK"{eK{Di)) where K is the key used 
in the Iteration step, and K” ^ K. 

— Output transformation 1 is: g{Hq) = Hq, i.e. the identity transformation. 

— Output transformation 2 is: g{Hq) = eK'{Hq), where K' K. 

— Output transformation 3 is: g{Hq) = eK{dK'{{Hq)), where K' ^ K. 

These options are combined in the ways described in Table 1 to yield four of the 
six different CBC-MAC schemes defined in ISO/IEC 9797-1, [1]. 



Table 1. CBC-MAC schemes defined in ISO/IEC 9797-1 



Algorithm 

number 


Input trans- 
formation 


Output trans- 
formation 


Notes 


1 


1 


1 


The ‘original’ CBC-MAC scheme, [3]. 


2 


1 


2 


K' may be derived from K. 


3 


1 


3 


CBC-MAC-Y, [4]. The values of K and 
K' shall be chosen independently. 


4 


2 


2 


K" shall be derived from K' in such a way 
that K' / K". 



Finally note that three Padding Methods are defined in [1]. Padding Method 
1 simply involves adding between 0 and n — 1 zeros, as necessary, to the end 
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of the data string. Padding Method 2 involves the addition of a single 1 bit at 
the end of the string followed by between 0 and n — 1 zeros. Padding Method 3 
involves prefixing the string with an n-bit block encoding the bit length of the 
string, with the end of the string padded as in Method 1. When using a MAC 
algorithm it is necessary to choose one of the padding methods and the degree 
of truncation. 



4 Attacks on CBC-MACs 

There are two main types of attack on MAC schemes. In a MAC forgery attack, 
an unauthorised party is able to obtain a valid MAC on a message which has not 
been produced by the holders of the secret key. Typically the attacker will use a 
number of valid MACs and corresponding messages to obtain the forgery. A key 
recovery attack enables the attacker to obtain the secret MAC key. The attacker 
will typically need a number of MACs to perform such an attack, and may 
require considerable amounts of off-line computation. Note that a successful key 
recovery attack enables the construction of arbitrary numbers of MAC forgeries. 
In this paper we are exclusively concerned with key recovery attacks. 

All attacks require certain resources (e.g. one or more MACs for known data 
strings). Clearly the less resources that are required for an attack, the more 
effective it is. As a result we introduce a simple way of quantifying an attack’s 
effectiveness. 

Following the approach used in [1], we do this by means of a four-tuple 
which specifies the size of the resources needed to the attacker. For each attack 
we specify the tuple [a, 6, c, d] where a is the number of off-line block cipher 
encipherments (or decipherments), b is the number of known data string/MAC 
pairs, c is the number of chosen data string/MAC pairs, and d is the number of 
on-line MAC verifications. The reason to distinguish between c and d is that, in 
some environments, it may be easier for the attacker to obtain MAC verifications 
(i.e. to submit a data string/MAC pair and receive an answer indicating whether 
or not the MAC is valid) than to obtain the genuine MAC value for a chosen 
message. 

5 Key Recovery Attacks 

In order to understand why the large number of CBC-MAC variants exist, we 
need to discuss some elementary key recovery attacks. 

MAC algorithm 1 can be attacked given knowledge of one known mes- 
sage/MAC pair (assuming that m > k). The attacker simply recomputes the 
MAC on the message with every possible key, until the key is found giving the 
correct MAC. This attack has complexity [2^, 1,0,0], which is feasible if k is 
sufficiently small. E.g., if the block cipher is DES then k = 56, and it has been 
shown, [5], that a machine can be built for a few hundred thousand dollars which 
will search through all possible keys in a few days. 
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MAC algorithm 2 is subject to a similar attack. However, MAC algorithm 
3 (CBC-MAC-Y) is not subject to this attack, which is one reason that it has 
been adopted for standardisation. In fact the best known key recovery attack 
on this MAC algorithm has complexity [2^+^, 2"/^, 0,0], as described in [6]. An 
alternative key recovery attack, requiring only one known MAC/data string pair, 
but a larger number of verifications, is presented in [7]; this attack has complexity 
[ 2 ^ 1 , 0 , 2 '=]. 

Whilst there are many situations where MAC Algorithm 3 is adequate, in 
some cases the above-referenced key recovery attacks pose a threat to the se- 
crecy of the MAC key. One example of a scenario where this might be true 
is where users of a service (e.g. banking or mobile telephony) are issued with 
tamper-resistant devices containing unique MAC keys. It might be possible for 
an individual responsible for shipping these devices to consumers to interrogate 
devices for a considerable period of time before passing them to their intended 
user. This interrogation might involve generating and/or verifying large numbers 
of MACs. If such an interrogation could be used to obtain the secret key, then the 
security of the system might be seriously compromised. This is precisely the case 
for GSM implementations using the COMP128 algorithm, where, as described 
in recent postings on the web, [8], access to a SIM (Subscriber Identity Module) 
by a retailer could enable the authentication key to be discovered prior to a SIM 
being issued to a customer. 

This motivates the development of MAC algorithm 4 (first described in [7]), 
which was designed to offer improved security relative to MAC algorithm 3 at 
the same computational cost. Unfortunately, as shown in [9], MAC algorithm 4 
only offers a significant gain in security with respect to key recovery attacks if 
Padding Method 3 is used. 

6 Chaining Variable Protection 

6.1 The Need for Encryption 

Suppose the chaining variable Hi is passed to and from the module in unen- 
crypted form. Suppose also that an attacker has temporary access to the API of 
the cryptographic module, and wishes to recover the MAC key. 

We first consider the case of MAC algorithm 3, where we suppose that an 
independent pair of keys {K,K') is used. A single use of the ‘Type B’ call to 
the MAC computation function will return a chaining variable which depends 
only on the first key K (the chaining variable is essentially a MAC as generated 
by MAC algorithm 1). Knowledge of this one chaining variable, and of the data 
string used to yield it, will be sufficient to recover K in an attack of complexity 
[2^, 1,0,0]. Given an additional MAC/data string pair, the other key K' can be 
recovered with a similar ‘brute force’ search, and hence we have recovered the 
entire key at a total complexity of [2*+^, 2, 0,0]. That is, MAC algorithm 3 is now 
no more secure than MAC algorithms 1 and 2. 

The same attack applies to MAC algorithm 4, and thus, for APIs supporting 
MAC algorithms 3 and/or 4, encryption of the chaining variable is essential. 
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6.2 Simple Encryption Is Not Enough 

We next see that, for MAC algorithm 4, encryption is not always sufficient. As 
we noted above, MAC algorithm 4 is best used in conjunction with Padding 
Method 3. However, although use of this padding method prevents the attacks 
described in [9], it does not prevent attacks made possible by the availability of 
encrypted chaining variables. This is based on the assumption that if the same 
chaining variable is output twice, then the encrypted version of that chaining 
variable will be the same on the two occasions. This will enable us to find a 
‘collision’, which can be used as part of a key recovery attack. We next describe 
such an attack. 

The attacker first chooses two n-bit blocks: Di and D[. These will represent 
the first blocks of padded messages, and hence they will encode the bit length 
of the messages (since Padding Method 3 is being used). Typically one might 
choose Di to be an encoding of 4n and D[ to be an encoding of 3n, which will 
mean that Di will be the first block of a 5-block padded message and D[ will be 
the first block of a 4-block padded message. For the purposes of the discussion 
here we suppose that Di encodes a bit-length resulting in a (q-l- l)-block padded 
message, and D[ encodes a bit-length resulting in a g-block padded message 

(9 > 4). 

The attacker now acquires 2"/^ ‘Type B’ MAC computation calls to the 
cryptographic module for a set of 2”/^ two-block ‘part messages’ of the form Di, 
X, where X varies at random. The attacker also acquires a further 2"/^ ‘Type 
B’ calls for the 2”/^ two-block ‘part messages’ of the form D[, Y, where Y varies 
at random. By routine probabilistic arguments (called the ‘birthday attack’, see 
[10]), there is a good chance that two of these ‘part messages’ will yield the 
same chaining variable. We are assuming that this will be apparent from the 
encrypted versions of the chaining variables. 

The attacker now has a pair of n-bit block-pairs: (Z?i, D 2 ) and {D[, D' 2 ) say, 
which should be thought of as the first pair of blocks of longer padded messages, 
with the property that the ‘partial MACs’ for these two pairs are equal, i.e. so 
that if 



H\ = eK"{eK{D\)), 

H 2 = e/c(Z?2 © Hi), 

H'l = eK"{eK{D[)), and 
H!2 = eK{D'^(BH[), 

then H 2 = i? 2 - 

The remainder of the attack is a modified version of Attack 1 from [9]. The 
attacker next acquires 2"/^ ‘Type A’ calls to the cryptographic module for a set 
of 2"/^ {q + l)-block padded messages of the form 

D\, D 2 , Xi, X 2 , . • . , Xq- 1 , 

where Ai, A 2 , . . . , Ag_i can be arbitrary. As previously, there is a good chance 
that two of these messages will yield the same MAC. Suppose the two padded 
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strings are Di, Z? 2 , E 4 , . . . , Eq+i and Di,D 2 , E'^, E 4 , . . . , Eq_^_i, and suppose 

that the common MAC is M. 

Now submit the following two padded strings for MACing, namely: 

D'l , D'2^ E3, E4, . . . ,Eq 

and 

D[,D' 2 ,E'^,E' 4 ,...,E'q. 

If we suppose that the MACs obtained are M' and M” respectively, then we 
know immediately that 

dx'iM') © Eq+i = dxidK'iM)) = dx'iM") © if'+i- 

Now run through all possibilities L for the unknown key K' , and set x{L) = 
dL(M') and y{L) = dL{M"). For the correct guess L = K' we will have x{L) = 
dK'(M') and y{L) = dK'{M"), and hence Eq+i (Bx{L) = E'q_^i ©y(L). This will 
hold for L = K' and probably not for any other value of L, given that k < n {\i 
k > n then either a second ‘collision’ or a larger brute force search will probably 
be required). 

Having recovered K', we do an exhaustive search for K using the relation 
dK'(M') © Eq+i = dK{dK'{M)) (which requires 2^ block cipher encryptions). 
Finally we can recover K" by exhaustive search on any known text/MAC pair, 
e.g. from the set of 2"/^, which again will require 2^ block cipher encryptions. 

It follows that the above attack has complexity [2^+^, 0,3. 2"/^, 0], which is not 
significantly greater than the complexity of the best known key recovery attacks 
against MAC algorithm 3. 

Thus it is important that the encryption is performed in such a way that 
if the same chaining variable is output twice then this is not evident from the 
ciphertext. This can be achieved in a variety of ways. One possibility is to encrypt 
the chaining variable using a randomly generated session key, and to encrypt this 
session key with a long term key held internally to the cryptographic module. 
The encrypted session key can be output along with the encrypted chaining 
variable. 



6.3 Message Length Protection 

If Padding Method 3 is used, then it is clearly necessary to inform the crypto- 
graphic module of the total message length when the first ‘Type B’ call is made, 
so that the Padding Block can be constructed and used to perform the first part 
of the MAC calculation. We next show that, along with the chaining variable, 
it is important for the on-going MAC calculation to ‘keep track’ of this message 
length, and of the amount of data so far processed. 

Suppose this is not the case. Then a simpler variant of the attack described 
in Section 6.2 above becomes possible. This operates as follows. 

The attacker arranges for the part MAC to be computed on the padded 
two-block ‘part message’ {Di, D 2 ) using a ‘Type B’ MAC computation call to 
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obtain the resulting chaining variable H2, which may or may not be an encrypted 
version of the chaining variable resulting from the MAC operation. The attacker 
then assembles approximately 2”/^ distinct message strings Ai, A2, . . . , Ag_i 
(g > 3) and arranges for their MACs to be computed using as many ‘Type M’ 
and ‘Type E’ calls as necessary, and with H2 as the initial chaining variable 
input to the first ‘Type M’ call. The attacker effectively finds the MACs for 
the message strings Di, Z?2, Ai, . . . , Ag_i. There is a good chance that two of 
these messages will yield the same MAC. Suppose that the two padded strings 
are Di,D2,E ^, . . . , Eq^i and £>1,1)2, £3, • ■ • , Denote the common MAC 

by M. 

Since we are assuming that the cryptographic module does not keep track of 
the message length, it is not important whether the contents of block £>i match 
the true length of the rest of the message. The MAC values obtained may not be 
true MACs of the (padded) input strings using Padding Method 3. Our objective 
is to recover the secret keys and not to present verifiable MAC forgeries. The fact 
that the calculated MAC values are not true MACs is unimportant and bears 
no consequence on the key recovery attack to be described. 

The attacker now submits as many ‘Type M’ and ‘Type E’ calls as nec- 
essary to obtain the ‘MAC’ value for the two strings £>i, £>2, £3, . . . , £5 and 
£>i, £>2, £3, . . . , £' using the partial MAC £2 for the input pair (£>i,£>2) as 
chaining variable input to the first ‘Type M’ call in each case. Denote the final 
‘MAC’ outputs by M' and M” respectively. As before, we know immediately 
that 

d,K'{M') © Eq+i = d,K{dK'{M)) = d,K'{M") © £'+i. 

The attack now proceeds as in the previous case to recover K' , K and finally 
K” . The complexity of this version of the attack is [2^+^,0,2"/^,0], which is less 
than the complexity of the attack described previously. 

The main conclusion we can draw from this attack is that, when Padding 
Method 3 is used, a further variable should be input and output along with 
the chaining variable. This variable should indicate the number of data bits 
remaining to be processed. In addition, integrity protection should be deployed 
over the entire set of input/output variables, to prevent modifications being 
made. 



7 Other Issues 

The attacks described above are only examples of possible weaknesses of APIs 
when ‘partial’ MAC calculations are performed. We now mention one other area 
where problems may arise. 

In some applications it is desirable to have a cryptographic module which can 
exist in two operational modes. In one mode MAC calculations are possible, but 
in the other mode only MAC verifications are possible. Indeed, by implementing 
such a scheme, it is possible to gain some of the desirable properties of a digital 
signature, simply by using a MAC. 
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To implement such a scheme will require two different sets of API procedure 
calls, one for MAC computation and one for MAC verification. This is simple 
enough, except when we consider the issue of verifying a MAC on a data string 
which is too long to be handled in one call to the module. In such a case it 
will be necessary to implement ‘Types B, M and E’ procedure calls for MAC 
verification. However, it should be clear that the ‘Type B’ and ‘Type M’ calls will 
be indistinguishable from the corresponding calls for a MAC computation. This 
poses significant risks to the separation of MAC computation and verification, 
and will need to be analysed carefully in the design of any module API. 



8 Summary and Conclusions 

There exist Cryptographic APIs and Cryptographic Modules following the model 
discussed in this paper. Sometimes Cryptographic APIs do not pass a chaining 
variable value but rather a pointer to the memory location of the chaining vari- 
able (eg [11]). It is not clear whether this memory is accessible or not by a 
potential attacker - if this memory is part of the cryptographic module, then 
presumably it is not accessible; otherwise, the attacks described here are pos- 
sible. It has not always been possible to obtain the necessary detail about the 
format of the chaining variables produced by the investigated cryptographic 
modules and APIs (eg [12,11,13,14]). This is understandable as this information 
may be regarded as proprietary and sensitive by the manufacturers. The recent 
publication of [1] means that MAC algorithm 4, as described there, has proba- 
bly not yet been implemented by many manufacturers, and thus the comments 
and attacks presented here should be considered as illustrations of what could 
happen, rather than actual attacks on any APIs and cryptographic modules in 
use today. However, the general comments on block-cipher based MACs may be 
relevant even to products in use today, especially if the chaining variables are 
not encrypted or their integrity is not protected. 

When chaining variables have to be imported and exported from a crypto- 
graphic module, it is important that they be output in encrypted form. Moreover, 
the encryption should operate in such a way that if the same chaining variable 
is ever output twice, then this will not be evident from the ciphertext. 

Moreover, when Padding Method 3 is used, a further variable should be input 
and output along with the chaining variable. This variable should indicate the 
number of data bits remaining to be processed. In addition, integrity protection 
should be deployed over the entire set of input/output variables, to prevent 
modifications being made. 
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Abstract. In this paper we analyse the ECIES encryption algorithm 
in the generic group model of computation. This allows us to remove 
the non-standard interactive intractability assumption of the proof of 
security given in the literature. This is done at the expense of requiring 
the generic group model of computation. 



1 Introduction 

The area of ‘provable security’ has in recent years become increasingly important 
in cryptography. This has been driven by two forces. Firstly the increased in- 
ternational standardisation effort has led cryptographers interested in deployed 
systems to increasingly depend on systems and protocols for which the security 
is based on real scientific proof rather than ad hoc arguments. Secondly, the 
tools for provable security have advanced so that schemes as efficient as those 
currently employed can now be fully analysed. 

The most important encryption scheme to come out of this effort has been 
RSA-OAEP, as used in almost all standards. The scheme RSA-OAEP replaced 
an earlier RSA encryption scheme, called PKCS vl.2, which was shown to be 
weak by Bleichenbacher [6]. The RSA-OAEP scheme was a great advance. Al- 
though it cannot be proved secure in the standard model of computation, if one 
is prepared to accept proofs in the random oracle model then one can give a 
proof of security against active adversarys, see [5], [12] and [19]. 

There have been a number of attempts to provide “good” encryption schemes 
based on discrete logarithms. The original El Gamal encryption algorithm [11] 
can be shown to be secure against passive adversaries, but it is totally insecure 
against active adversaries. A number of schemes have been given which are secure 
under the standard model of computation, e.g. the Cramer-Shoup scheme [9]. 
However, this later scheme is less efficient than currently deployed schemes. 

In [3] Abdalla, Bellare and Rogaway present an encryption scheme, called 
DHIES, which is secure in the standard model of computation. The DHIES 
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scheme is particularly important since it is as efficient as El Gamal. The com- 
bined effect of a security proof and its efficiency has led DHIES to be standard- 
ised in a number of standards including ANSI X9.63 [1] and SECG [2]. DHIES is 
particularly suited to elliptic curve groups, in which case the scheme is slightly 
simplified and is called EGIES. 

Although the proof of security of EGIES is in the standard model of computa- 
tion, it relies on a non-standard intractability assumption. Namely, an interactive 
hash based version of the Decision Diffie-Hellman problem. This has led some 
authors to criticise the security proof. An earlier version [4] of [3] had a claim of 
a proof of security of EGIES based on the concept of plain-text awareness, and 
hence also in the random oracle model, this proof was later withdrawn by the 
authors, see [17]. 

In this paper we adapt the proof of security of EGIES so that it does not 
depend on any non-standard intractability assumptions. This is done at the 
expense of working in the generic group model, a model which is often used to 
reason about elliptic curve systems, see for example [14], [7] and [8]. The generic 
group model is like the random oracle model except that instead of an idealised 
hash function being modelled we model an idealised finite abelian group of prime 
order. In particular we shall prove 

Theorem 1. In the generic group model 

InSec^^^^ (EGIES; t, v, w, /i, to) < InSec(SYM; ti, 0, to, to') 

+ 2v InSec(MAG; ^3, u — 1) H — - — — — 

For the exact definition of each of these quantities we refer to the next sections. 

The author would like to thank Alf Menezes for suggesting this problem to 
him and Dan Brown for useful conversations during which the work on this paper 
was carried out. 



2 The ECIES Scheme 

Let G denote a group of prime order q with generator g. We require a symmetric 
encryption scheme SYM = {Ek, Dk), a MAG function MAGfc, and a key deriva- 
tion function V . The key space of SYM will be denoted Ki and the key space 
of the MAG will be denoted K 2 ■ The key derivation function V will map group 
elements to the key space of both the encryption and MAG functions. The key 
size of both SYM and MAG will be around n/2 where n = log2((?), hence the 
output of V should be of size approximately n. 

The properties required of the key derivation function V will be quite mild 
when working in the generic group model, namely the output of V could simply 
return the bit representation of the input group element. When we turn to real 
world implementations we see a real difference between the real world and the 
generic group model. We shall return to this in a latter section. 
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The scheme ECIES is defined as a triple of randomised algorithms, {keygen, 
enc, dec}. 

keygen 



1 . 

2. pk ^ g^. 

3. sk ^ 

4. Return (pk,sk). 



enc(pk, m) 

1 . 

2. u , t -<r- pk"^. 

3. (fci, ^ 2 ) 

4. c ^ Ek^{m) 

5. r^MACfe,(c) 

6. Return e f— w|lr||c. 



dec(sk, e) 

1. Parse e as m||?"||c 

2. t ^ 

3. (/Cl, fe) ■i— V (t) 

4. Ifr^MACfe,(c) 

Return Invalid 

5. TO ^ Dk^{c). 

6. Return to. 



The use of an arbitrary symmetric key encryption function allows one to 
encrypt arbitrary long messages. The use of the MAC function is to protect 
against chosen ciphertext attacks. 



3 Definitions 

It is generally accepted that the “correct” notion of security for public key en- 
cryption functions is one of indistinguishability under adaptive chosen ciphertext 
attack, see [13], [15], [16] and [10]. This is defined by a game which the adversary 
A plays, made up of two phases. In the first phase, denoted find, the adversary 
outputs two messages TOi and m 2 - Then, hidden from the adversaries view a bit 
b is chosen. The message mb is encrypted, to give Cb, and the encryption is given 
to the adversary. In the second guess stage the adversary tries to guess the bit 
b. The adversary is declared successful if the guessed bit is b. 

In an adaptive chosen ciphertext attack the adversary has access to a decryp- 
tion oracle which he can use to query any ciphertext of this choosing, except for 
the target ciphertext et,. Since we are in the public key setting the adversary 
clearly has access to a encryption oracle. 

More formally we define the advantage of the adversary A as 

(pk, sk) ^ ECIES. keygen; 

(too,toi,s) ^ A^^^^® ‘^®“(find, pk); 
b ^ (0, 1}; e ^ ECIES. enc(pk, mb) : 

REGIES. dec pk, S,e) = b 

Note, the advantage is considered to lie between 0 and 1. A value of zero indi- 
cating an adversary who does no better than guess the bit b, whilst a value of 
one indicating that the adversary always guesses correct. The security of ECIES 
is then defined to be 

InSec^®^^ (ECIES; t, v, w, g, to) = max {Adv^^^^ (ECIES)} , 

where the maximum is over all adversaries A running in time t, making at most 
V queries to the decryption oracle of ECIES and w queries to the generic group 
oracle. All the decryption queries total at most /i bits, with size of the challenge 



Adv5®^2 (ECIES) = 2Pr 
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messages mo and mi being at most m bits. An oracle query to the generic group 
oracle will consist of a call to either the group operation oracle, the equality 
test oracle, the inversion oracle or the oracle to generate a new random group 
element. Each of these oracle queries will be assumed to have the same cost. 

One can define security of the symmetric encryption scheme SYM = {Ek, Dk) 
in a similar way, except now one gives the adversary access to an encryption 
rather than a decryption oracle, for the target key K. Thus we are measure the 
security of SYM to withstand an adaptive chosen plaintext attack. Formally we 
define 



Adv^(SYM) = 2Pr 



K ^ SYM. keygen; (mo,mi,s) ^ A^^(find); 
b -ir- {0, 1}; c EK{mb) : (guess, s,c) = b 



- 1 . 



The security of SYM is then defined to be 



InSec(SYM; t, fj,, m, m') = max{Advyi(SYM)} , 

A 



where the maximum is over all adversaries running in time at most t, asking 
queries of E^ totalling at most ^ bits in length. The output of the find stage is 
at most m bits, whilst m! is an upper bound on a ciphertext corresponding to a 
plaintext of m bits in length. 

Just as in [3] we define security of the MAC function in terms of an adversary 
who is given an oracle for the MAC function with respect to some hidden key 
K. The adversary is deemed successful if it can output a (message, MAC) pair 
which has not yet been asked of the MAC oracle. Formally 



Succ^(MAC) = Pr 



K -(r- MAC.fceygen; (x,r) ^ ^MAC(ic,-) . 
(x, r) is unasked and MAC(AT, x) = t 



The security of MAC is defined to be 



InSec(MAC; t, w) = max {Succa(MAC)) , 

A 

where the maximum is over all adversaries A running in time at most t and 
making at most v oracle queries to the MAC oracle. 

In all the above definitions the time t refers to the maximal number of steps 
that the algorithm requires in some fixed model of computation, and where each 
oracle query shall count as a single step. 

We end this section by discussing the standard Decision Diffie-Hellman prob- 
lem. This is the problem: Given and g^ determine whether z = xy 

(mod q). Formally we have 

Adv2^"(G) =Fr[u,v^{l,...,q}: A(g“, g™) = 1] 

-Ft [u,v ^ {1, . . . ,q}; h ^ G : A{g^, g'",h) = l]. 

Note, by results of Shoup [18, Theorem 4], we have 
Theorem 2. In the generic group model 

Adv2^^(G) = rcVg, 

where w is the number of queries A makes to the generic group oracle. 
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4 Proof 



The proof technique is exactly the same as the proof technique in [3] . The only 
difference is that we replace the interactive calls to the hash Diffie-Hellman oracle 
by calls to a generic group operation. This allows us to remove the need for the 
non standard intractability assumption, at the expense of assuming the generic 
group model. 

We present the full proof, rather than simply referring to [3] , since we wish to 
point out exactly how the proof depends on our assumption of a generic group 
model. In particular we wish to keep track of the number of oracle queries to the 
generic group oracle. 

Let A denote an adversary attacking ECIES in the sense above. Assume it has 
running time at most t, makes v queries to its decryption oracle and w queries 
to its generic group oracle. At the end of its find stage we assume it outputs a 
string of length at most m. Let m' denote an upper bound on the length of an 
encryption under SYM of a plaintext of length at most m. We wish to use A to 
attack the security of the decision Diffie-Hellman problem in our generic group, 
the underlying symmetric encryption system and the underlying MAC. To do 
this we will need to give A a simulation of its decryption oracle. We shall first 
describe the basic simulator, which we shall then modify below, the simulator 
will maintain an internal list of group pairs L, which “represent” elements and 
their discrete logarithms. These are not “correct” discrete logarithms but the 
adversary will not be able to tell the difference. 

Decryption Simulator(e) 

1. Parse e as w||r||c. 

2. If (u, t') G L for some t' , set t t', else t G. 

3. L i — L U {(ri, t)}. 

4. {ki,k2) ^ V{t) 

5. If r yf MACfc 2 (c) return Invalid 

6. m ^ Dfcj(c). 

7. Return m. 



Notice that algorithm A, when we supply it with the above decryption simulator 
will now make at most v + w calls to the generic group oracle. 

We now describe an algorithm B which will use A to break the encryption 
scheme. Recall from the previous definition that B has access to an oracle for 
encryption, but not decryption and that it runs in two stages. 

Algorithm B®(find) Algorithm B® (guess, c’, s’) 



1. pk^ G 

2. (too, mi, s) A(find, pk). 

3. Output (toq, TOi, (toq, TOi, s, pk)). 



1. Parse s' as (toq, toi, s, pk). 

2 . u^G. 

3. ^ — A 2 . 

4. r^MACfe2(c'). 

5. e ^ u||r||c'. 

6. 6 ^ A(guess, pk, s, e). 

7. Return b. 
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Clearly B's running time is 

ti = 0{t + 2TIME(5 + TIMEMAC.gera(tnO)- 

When applying algorithm B we use the decryption simulator given above, with 
one minor modification which we shall describe in a moment. Hence, B requires 
V queries to the decryption simulator and a total of 2 + u + t/; queries to the group 
oracle. The time for these queries is already accounted for in the time t needed 
to run algorithm A. Note, that although algorithm B has access, by definition, 
to an encryption oracle O it does not actually use this oracle. 

Due to the definition of security of SYM we need to modify the decryption 
simulator in one crucial way. This is because B is not allowed access to the 
decryption algorithm for the key k “corresponding” to the group element u 
in the target cipher text y. We call a Type Q query, of the decryption simulator, 
one for which the input is of the form 

r^llr'llc' 

for some r' and c' and for which the simulator does not output Invalid. As we 
are in the generic group model, we see that it is impossible for B to construct a 
non Type Q query which will result in the access of the decryption algorithm 
Dk for the key k, hence it is only the Type Q queries which we need to avoid. 
Hence, we modify the simulator as follows. If a Type Q query is made then 
Algorithm B will terminate by returning b •<— {0, 1}. 

We shall now consider an algorithm C which will use algorithm A to attempt 
to solve the Decision Diffie-Hellman problem in G. It will take as input g°',g^ 
and (/“, and output one if it believes c = ab, and zero otherwise. 

Algorithm C{g‘^ ,g^ ,g‘^) 

1. M ^ g°-. 

2. pk^ 

3. {ki,k2) ^ E(g°). 

4. (too, mi, s) ^ A(find, pk). 

5. {0, 1}. 

6. c ^ Eki{mb). 

7. r •<— MACfcj (c). 

8. e ^ 

9. b' ^ A(guess, pk, s, e). 

10. If 6' = 6 Return one, else Return zero. 

Algorithm C’s running time is bounded by 



t 2 = 0{t + TIMEsYM.enc(w) + TIMEMAC.gen) (^0) 

and it makes v queries to algorithm A’s decryption oracle, which means that 
using the above simulator it requires v + w queries to the generic group oracle. 
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When algorithm C is given a valid Diffie-Hellman triple as input it runs 
algorithm A just as one would run A to mount an attack against ECIES. Hence, 

Pr[a,b,^{l,...,g};c^a-b : C( 5 “, /) = l] 

1 + Adv5°^2 (ECIES) 

“ 2 ■ ^ ^ 

Now suppose C is not given a valid Diffie-Hellman triple. When A does not make 
a Type Q query, C runs A in the same way that B runs A. Hence, 



Pr [a,b,c^ {!,..., g} : C{g°',g\g‘') = 1 A -.Type Q] 



l-hAdvB(SYM) 

2 



Now since B makes zero encryption queries and runs in time at most ti we 
obtain 



Pr[a,6,c^ {l,...,g} : < 7 ( 3 “, 3 "=) = 1 A ^Type Q] 

^ 1 -I- InSec(SYM; ti, 0, m, m') 

- 2 ■ ^ ’ 

When A makes a Type Q query of its decryption oracle we see that C runs 
A just as the following algorithm D runs A to break the MAC function. We 
first present algorithm D and then present the modification to the decryption 
simulator which is needed. Note that algorithm D is given access to an oracle O 
for the keyed hash function, for the key “corresponding” to the group element u 
in the target ciphertext. 

Algorithm D® 

1. pk, u ^ G. 

2. fci ^ Ki. 

3. j ^ {!,..., u}. 

4. (too, mi, s) ^ A(find, pk). 

5. b' ^ {0,1}. 

6. c ^ 

7 . r^O{c). 

8. e f— w||r||c. 

9. Run A(guess, pk, s, e). 

10. Return W. 

The modified decryption simulator runs as follows. Each call to the decryption 
simulator is given an index zG{1,...,u}. 

Decryption Simulator(ei) 

1. Parse as u'||r'||c'. 

2. If (u', t') G L for some t' , set t ^ t' , else t G. 

3 . L\j{{u',t)}- 

4. 

5. If i yf j and u' = u then 
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6. 


If O(c') = r' then 


7. 


Return Dk^{c') 


8. 


Else Return Invalid. 


9. Else 


if i yf J and v! ^ u then 


10. 


If MACfc^(c') = r' then 


11. 


Return 


12. 


Else Return Invalid. 


13. Else 




14. 


W <- (c',r'). 


15. 


Return Dk^{c'). 



Suppose algorithm A makes a Type Q query e' to its decryption oracle. Let 
denote the number of one such query, with Cj = M'||r'||c'. Now if j in Algorithm 
D takes this value of i then D succeeds in breaking MAC since (c',r') will be 
a valid pair. However, the equality j = i can happen with probability at most 
1/v, we obtain 

Pr [Algorithm A makes a query of Type Q] < u Succd(MAC). 

Algorithm D makes at most v —1 queries to its MAC oracle O and runs in time 

ts = 0{t + 2TIMEg + TIMEMAC.gen(w') + TIMEsYM.enc(w)) , 

since the time to cope with the decryption oracle queries is included in t. We 
therefore have 



Pr[a,6,c^ {1,...,(7} : C( 5 “, /) = 1 A Type Q] 

< uInSec(MAC; v — 1). 

Finally combining inequalities (1), (2) and (3) we obtain 

Adv 5 °^ 2 (ECjEg) InSec(SYM; ti, 0, m, to') 



(3) 



Advg^^ 



(G)> 



— uInSec(MAC; ta, w — 1). 



Recall that C makes v + w queries to the generic group oracle and so by Shoup’s 
result we obtain 

(u + u>)2 ^ Adv5^'^2(ECIES) InSec(SYM;fi,0,TO,TO') 

q - 2 2 

— V InSec(MAC; ts, u — 1). 



In other words 

Adv5^'^^(ECIES) < InSec(SYM; ti, 0, TO, to') + 2 u InSec(MAC; ta, u — 1) 

2{v + w)"^ 



q 
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Finally since A was an arbitrary adversary subject to the constraint that it run 
in t steps, made v calls to its decryption oracle all of which total at most /r bits, 
vj calls to its generic group oracle and the size of the output from its find stage 
was at most m we conclude 

AdvS^^2 (ECIES) = InSec®^^^ (ECIES; t,v,w, fi, m) . 

The main theorem now follows. 



5 Practical Considerations 

The original DHAES paper suggested using a key derivation function of the form 
V{u,t)- Otherwise this can lead to trivial malleability of the ciphertext in the 
case when a group of non-prime order is used. To see this, assume we have a 
ciphertext of the form c = u||r||c corresponding to the plaintext m, where u is 
of prime order q but the underlying group is of order 2 q. If the private key x is 
even then one trivially obtains the following valid ciphertext, corresponding to 
the plaintext to, 

u ■ /i||r||c 

where h is an element of order two. To avoid this trivial malleability, which is not 
known to cause a problem in practice, we need to either use the key derivation 
function V {u, t) or check that for each received ciphertext u is an element of 
order q. 

In our analysis we have assumed a generic group of prime order. When we 
instantiate our generic group to the type of elliptic curves used in practical 
systems we can avoid this problem in a number of ways: 

Large Prime Characteristic : In this case the elliptic curve group order is 
usually chosen to have prime order. Hence, simply checking that the point u lies 
on the curve will avoid the above malleability issue. 

Characteristic Two : In this case the elliptic curve group order is usually 
chosen to have order twice a prime, i.e. ^E(F 2 »>») = 2 q and the value of to is 
prime. Assume the elliptic curve is given in the form 

y^ + xy = X^ + X^ + b. 

The following Lemma lets us easily check that u has prime order q in this case. 

Lemma 1. Assume m is odd, ^£(¥2^) = 2 q with q prime and P = {x,y) € 
E(F2m). The point P has order q if and only if 

Tr]p-2(a;) = 1 and Trp2(6/a;^) = 0. 

Proof. If the point P has order q then one can find a point Q = (xi,yi) such 
that 



P = [ 2 ]Q. 
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We then have that 

A = xi -j- ■— G 

Xi 

where A is a root of the equation 

A^ + A + l + a; = 0. 

Now the existence of A G F 2 m is equivalent to = 1. 

We are left with showing that given A G ¥ 2 ^ the existence of Xi,yi G F 2 m. 
We obtain two simultaneous equations, in xi and j/i, given by 

yi= x\ + Xxi 

Q = y\ + xiyi + x\ + xl + h 
Eliminating yi we obtain the equation, with t = x\, 

0 = + (A^ + A + 1)t + 5 

= + XT + h 

which will have a solution if and only if ¥rf^{h/x‘^) = 0. 

We now turn to a discussion of the function V . In the generic group model 
there was little restriction required on such a function, and we could assume that 
it simply output the bit representation of the underlying group element. For any 
particular group, i.e. elliptic curves, one needs to be more careful. For example 
one should not just take the x-coordinate as the output of the key derivation 
function, one should also take a contribution from the y coordinate. This is to 
stop the obvious collision V{P) = V{—P) when only the x-coordinate is used as 
input to V. 

But, even if we used the bit representation of the compression of the point P, 
our proof still does not apply when the generic group was replaced by an elliptic 
curve group. This is because at one crucial point we assumed that it was hard 
for algorithm B to solve the following problem: Given public u G G and private 
X G Z find a u' G G such that 

where F|sym is the function V where the output is projected down to the key 
space of the symmetric encryption function only. It is clear that this problem is 
easy when F|sym(^’) is defined to be the first n/2 bits of the point compression 
of the elliptic curve point P, namely the first n/2 bits of the x-coordinate of P. 
Since then 

k^|sYM(N^’) = I^|sYM([a;](-T’)). 

Hence, our technique of using the generic group to remove the interactive De- 
cision Diffie-Hellman assumption from the proof in [3] has essentially hidden the 
problem of a non-trivial interaction between the key derivation function V and 
the group. In [3] this is also hidden within the interactive Decision Diffie-Hellman 
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assumption. Hence, when instantiating the protocol to a real life situation some 
care needs to be taken. For example V should be instantiated using a hash 
function such as SHA-1. 

Perhaps the most interesting conclusion from this discussion is that, whilst 
ECIES is secure in the generic group model with a trivial definition for V, when 
we instantiate the group with an elliptic curve group, the protocol (with the 
same trivial definition of V) becomes insecure. Hence, this is possible evidence 
that elliptic curve groups should not be modelled by generic groups. 
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Abstract. This paper presents a new stream cipher family whose 
output bits are produced by blocks. We particularly focus on the 
member of this family producing 128-bit blocks with a 256-bit key. The 
design is based on a new technique called crossing over which allows 
to vectorize stream ciphering by using nonlinear shift registers. These 
algorithms offer a very high cryptographic security and much higher 
speed encryption than any existing stream ciphers or block ciphers, 
particularly the AES candidates. A 1000 euros rewarded cryptanalysis 
challenge is proposed. 

Keywords: stream cipher, nonlinear feedback shift register, vectorized 
cipher, high speed encryption. Boolean functions, block cipher. 



1 Introduction 

The transformation by composition of cryptographic primitives allows to build 
some other primitives. One famous example is that of block ciphers which can 
simulate stream ciphers through output feedback mode [4,18]. 

The main advantage of this approach is to obtain provable security. Attack on 
such simulated stream ciphers (e.g.) would, indeed, be equivalent to cryptanalyze 
the underlying block cipher. Such an approach is described in [3] for the ANSI 
message authentication code, built from block cipher (however there are few 
exceptions: for example, CBC mode revealed vulnerabilities independently from 
the quality of the underlying block cipher) . 

However building stream ciphers from block ciphers or hash functions repre- 
sents a major drawback: block ciphers are usually much slower than real, true 
stream ciphers. Moreover they are not as suitable as stream ciphers, for VLSI 
implementations. The explosion of today’s need for secure communications asks 
for very high encryption speed that block ciphers achieve with some difficulty. 
By now, the fastest block cipher software implementation runs at nearly 260 
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Mbits/s (RC6 [2]). Even the few known software-optimized stream ciphers, such 
as SEAL [22], offer relatively limited speed encryption. 

On the other hand, security of most frequent and common variants of stream 
ciphers (based on linear feedback shift registers and suitable Boolean functions) 
becomes little by little challenged and questionned. Recent new powerful attacks 
[5,6,8,12] proved that effective, real-life cryptanalysis become more and more 
reachable. In terms of security, recent developments of cryptology (NIST call for 
AES [19] is the most obvious) tends to be in favour of block ciphers. 

One approach which has not been very often considered so far, is to consider 
and take the best of both worlds: to simulate block cipher by vectorizing stream 
ciphers. It nevertheless would combine the advantage of both sides without their 
respective drawbacks. First known attempts have been presented in [13,15] with 
keyed hash functions built from stream ciphers and in [1] with block ciphers 
built from SHA-1 and SEAL. 

In this paper, we present a completely new approach in cipher design. By 
using only the known cryptographically strongest Boolean functions and Non 
Linear Feedback Shift Registers (NLFSR) , we build a family of vectorized binary 
additive stream ciphers (namely where output bits are produced as a block of L 
bits in a row) called COS ciphers, with arbitrary output block size. It is based 
on a new construction called crossing over allowing to consider the internal state 
at instant t as constituent of the ciphering output blocks or vectors. These latter 
are bitwise xored to blocks of plaintext to produce blocks of ciphertext. 

For each system of the COS family two modes are possible. The first one 
(mode I) considers any kind of plaintext. The second one (mode II) is restricted to 
plaintext without redundancy {e.g. compressed). This latter has been chosen for 
the Internet Film Independent Cinema (IFIC) project of the European PRIAMM 
call [21]. 

Given fast software implementation of Boolean functions and shift registers, 
and given that VLSI implementations require fewer gates than for block ciphers, 
our scheme reveals itself far better than previous ones in practical applications 
requiring large block sizes and high encryption speed. With 128-bit block size, 
experiments have reached very high speed encryption: about 330 Mbits/s with 
128-bit block size and nearly IGbits/s in a 512-bit block size setting (for mode 

II). 

In terms of security, we primarily rely on the cryptographic strength of the 
constituent primitives. The crossing over mechanism allows then to combine 
them in a very secure way. The key size can be 128, 192 and 256 bits long. These 
aspects insure a high level of security. 

This paper is organized as follows. We first present in Section 2 the COS 
cipher family itself, and particularly the 128-bit block size version. Section 3 will 
discuss security aspects. Section 4 presents the performance of our implementa- 
tions (encryption speed and code size). Section 5 finally presents the rewarded 
cryptanalysis challenge we propose. 

More materials and data on this cipher (C implementation, test vectors, ...) 
can be found in [7]. 
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2 The COS Ciphers 

2.1 General Description 

The COS cipher design exclusively centers on n + 1 NonLinear Feedback Shift 
Registers (NLFSR): M,Li,L 2 , ■ ■ ■ ,Ln- 

Consider the 2L-bit block size version. The M register is devoted to the key 
setup, that is to say to the generation of the Ti, L 2 , ■ ■ • , register initial states. 
M is 4L bits long. The Li registers as to them generate the output ciphering 
blocks through the crossing over mechanism. They each are 2L bits long and are 
irregularly clocked. 

The feedback Boolean functions of these registers Li have been chosen ac- 
cording to their cryptographic properties. They are optimally balanced, highly 
nonlinear and have high correlation-immunity order. Moreover, their Algebraic 
Normal Form (ANF), that is to say their representation as multivariate poly- 
nomials, has to present some particular structure to meet important security 
requirements defined in [10]. We have considered the strongest known functions 
meeting all these criteria. After numerous simulations, a careful choice has been 
done among functions proposed in [9,16,17,24]. Optimal functions have been 
taken: an eleven-variables function for the M register feedback and nine- variables 
functions for the Li register feedback. 

The central point of COS design lies in the crossing over technique inspired by 
the same mechanism as in chromosome genetic differentiation (hence the name 
COS standing for Crossing Over System). Consider n registers Ti, L 2 , ■ • ■ , of 
length 2L. If MSB{Li) and LSB{Li) denote respectively the L most significant 
bits and the L least significant bits of Li, then one output block generation step 
can be summarized as follows: 

1. Clock Li at least L times. 

2. Generate the 2(n— 1) output L-bit blocks Bk {k = 1, . . . , 2(n— 1)) as follows, 

for j = 1,. . . ,n such that j ^ i 

MSB{Bk) = MSB{Li) © LSB{L^) 

LSB{Bk) = LSB{Li) © MSB{Lj) 

3. t = z + 1 mod n 

In mode II all the L-bit blocks are used whereas in mode I, only two randomly 
chosen L-bit blocks will be used. 

It is obvious that the security of such a mechanism rely on the nonlinearity 
of the feedback. This aspect is discussed in Section 3.2. In terms of performance, 
we see that n 2L-bit registers produce in each step, 2(n — I) L-bit blocks with 
only (at least) L register clockings. Performance evaluation will be exposed in 
Section 4. 

From a general point of view we then will speak of a (n, 2L) COS cipher 
to describe one particular member of the COS family. Appendix B presents the 
general diagram of the (2, 2L) COS ciphers. We now present more deeply the 
(2, 128) cipher. Details of implementation and specifications can be found in [7]. 
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2.2 Key Setup of the (2, 128) COS Cipher 

Whatever may be the value of n and L, any (n, 2L) COS cipher works with an 
internal key K of 256 bits and a message key MK of 32 bits. So the first step 
in key setup expands shorter keys of either 128 or 192 bits to 256 bits. 

Let us represent the M register (256 bits) by eight 32-bit words M[0], M[l], 

. . . , M[7]. In the same way, we will represent the key by K[0], . . . , K\s\ where 
s = 3, 5, 7 according to the user’s key size, and Li[0], . . . , Li[3], L2[0], . . . , 

L 2 P] as the Li and L 2 registers. 

We first fill up the M register with the user’s key: 

M[i] = K[i] i = 0, . . . , s 

The expansion step then takes the user’s key to produce lacking bits to fill up 
remaining part of the M register M[s -1-1],... ,M[7]. A look-up table T will 
give 8-bit blocks from 8-bit blocks of the user’s key. Due to lack of space, this 
table will be found in [7]. To be more precise, if Ki denotes the 8-bit block of 
the user’s key starting at bit i, we have: 

• s = 3 (user’s key has 128 bits) 

M[4] = 2^4 * T[K248] + 2^® * T[K24o] + 2® * T[K232] + T[K22i] 

M[5] = 2^4 * T[K216] + 246 * T[K208] + 2® * T[K2oo] + T[Ki92] 

M[6] = 2^4 * T[Ki84] + 246 * T[Ki76] + 2® * T[Ki68] + T[Ki6o] 

M[7] = 2^4 * T[K^^2] + 246 * T[Ku4\ + 2® * T[K^^q] + T[K^2%] 

• s = 5 (user’s key has 192 bits) 

M[6] = 2^4 * T[K248 © K 216 © Ki84] + 246 * T[K2A0 © K 208 © Kiye] 

+2® * T[K2Z2 © K 200 © Kies] + T[AT224 © 7^192 © ATieo] 

M[7] = 2^4 * T[Ki52 © Ki 20 © Kso] + 24® * T[Ki44 © Kii 2 © Kso] 

-1-2® * T[A'i36 © ATio4 © 7 ^ 72 ] + r[7fi28 © 7^96 © 7^64] 

The 32-bit message key MK is then combined with initial state of register M: 

M[0] = M[0] © MK 

This key MK is to be changed for every different message. Since the intrinsic 
security of the cipher does not rely on MK, it can be transmitted with the 
message. Its role is to prevent different messages to be sent with the same internal 
256-bits base key. 

The M register is then clocked 256 times. The eleven-variables feedback 
function fll takes bits 2,31,57,87,115,150,163,171,201,227 and 255 of M as 
input bits and outputs one feedback bit (see Figure 1). Function fll having a 
too big representation cannot be given here due to lack of space. It will be found 
in [7]. Its properties will be exposed in Section 3.1. 

After these 256 clockings, M register internal state provides initialization of 
Li register in this way: 

Li\i] = M[i + 4] i = 0,...,3 
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Fig. 1. Clocking of the M register 



M is then clocked 128 times and L 2 is initialized as follows: 

L2[i\=M[i\ i = 0,...,3 



2.3 Encryption - Decryption of the (2,128) COS Cipher 

The output blocks of the cipher are xored with either plaintext blocks (enci- 
pherment) or ciphertext blocks (decipherment) we just need to describe how 
these 128-bit blocks are generated. Each step Si outputs one 128-bit block and 
decomposes as follows: 

1. Compute clocking value d. 

a) Compute elk = 2 * lsb{L 2 ) + lsb{Li). 

h) d= C[clk] where C[0, . . . , 3] = {64, 65, 66, 64}. 

2. Clock Li if i even or L 2 if i odd, d times. 

3. Produce a 128-bit block Bi in this way (see Figure 2): 

B, = (Li[l] © L2[3]) + 2^2 * {Li[0] © L2[2]) 

+264 ^ ^ ^ 266 * (Li[2] © L 2 [ 0 ]) 

4. (Mode I only) Compute block indices j and k and keep only blocks j and k. 
If lsb 2 {L) and msb 2 {L) denotes the two most, respectively least significant 
bits of L, then block indices i and j (mode I only) are computed as follows: 

j = lsb2{Li) © lsb2{L2) k = msb2{Li) © msb2{L2) 

To prevent equality between j and k that is to say repeated output blocks, 
we fix fc = J © 1 when j = k. 

Feedback Boolean functions are in 9 variables {f9a and f9b) and both use bits 
2,5,8,15,26,38,44,47 and 57 of Li and + 2 - Interested readers will find the 
complete functions in [7]. 

3 Security Evaluation 

The security of COS cipher lies primarily on the feedback Boolean functions and 
on the crossing over mechanism. The key size is large enough to resist exhaustive 
key search. In such kind of design with NLFSR, correlation attacks [5,6,8,12,20, 
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Fig. 2. Output block generation 



23], linear syndrome attacks [25,26], ... are no longer working and even have 
no longer significance. Indeed they can be used only when register feedback is 
linear. 

Thus concerning security, only two aspects have to be considered and con- 
trolled: randomness and presence of cycles. Both have great importance to pre- 
vent predictability on the ciphering output blocks. 



3.1 The Boolean Functions 

They primarily are important to yield the best possible randomness. Three 
Boolean functions have been chosen for the registers feedback: fll for the M 
register and f9a, f9b for the Li and L 2 registers. Functions fll is common to all 
members of the COS family. Additionnal functions, for n > 2 have been chosen 
for the other members [7]. 

The Boolean functions presenting the best cryptographic properties trade-offs 
(balancedness, correlation-immunity, nonlinearity, degree) have been presented 
in [9,16,17,24]. Different simulations have been conducted to choose the most 
suitable functions. Table 1 summarizes their main characteristics for the (2, 128) 
COS cipher. 



Table 1. Characteristics of feedback functions of the (2, 128) COS Cipher 





Balanced 


Correlation Immunity Order 


Non Linearity 


Degree 


fll 


yes 


CI(2) 


960 


7 


f9a 


yes 


CI(2) 


224 


6 


f9b 


yes 


CI(2) 


224 


6 



The strong trade-off between correlation-immunity and nonlinearity sup- 
presses any exploitable correlation first, between L\ and L 2 initializations and 
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K, and secondly, between output blocks and K. Moreover, experiments tend 
to show that nonlinearity (and to a certain extent correlation immunity order) 
has an important impact on the randomness properties. Particularly, we observe 
that the best statistical results were obtained for functions having an as high 
as possible algebraic degree. Thus, since the Boolean functions we have chosen 
satisfy the best trade-offs among all these criteria, they provide results as good 
as possible concerning randomness (see Section 3.3). 

The Algebraic Normal Form structure has some importance too, if we con- 
sider the following proposition: 

Proposition 1 [10, page 115] A Feedback Shift Register (FSR) with feedback 
function f{xj-i,Xj- 2 ,--- ,Xj-r) is non singular (or equivalently every of its 
output sequence is periodic) if and only if f is of the form: 

f • j ^j — L^) — ^j — L fB gis^j — 1 5 — 2 5 ■ • ■ 5 — L+l) 

for some Boolean function g. 

This very important property prevents degeneration in the output sequence of 
the FSR (see [10, chap. VI and VII]) for details). All the feedback functions we 
chose meet this requirement. 

3.2 The Crossing over Mechanism Security 

Its aim is to use Li internal states to directly build up output block and thus to 
obtain very high encryption speed. Moreover this mechanism directly takes part 
in strengthening the overall scheme security. 

First, probability to have cycles in the output sequence is greatly reduced. 
Forecasting presence of cycles of length strictly less than 2^“^ where L is the 
register length, still remains a general open problem for NLFSR [11]. Only sta- 
tistical testing (when tractable) can be envisaged up to now to give insight on 
possible cycles. Chosen Boolean functions of COS cipher family meet the few 
existing results in this area, that are exposed in [10]. 

Suppose however that registers L\ and L 2 present cycles of length respec- 
tively l\ and I 2 for some initializations. The crossing over mechanism will then 
obviously produce a cycle of length lcm{li, I 2 ) which is confidently supposed to 
be extremely large. In the general case of a (n, 2L) cipher, cycles we be of length 
lcm{h, , In). 

One other important role of the crossing over mechanism is to suppress any 
exploitable correlation between output blocks B and internal states of the Li 
and L 2 registers at any time. Since register contents are cross-xored together 
and since respective bit probability of each register internal bit is exactly it 
is impossible to guess any internal bit of both Li and L 2 , by knowing bits of B. 
In other words, if L* denotes the i-th bit of register Lj and B'^ the k-th bit of 
B then: 



P[Li = ajs'' = b]= P[L^2 = a © fejs'' = ^] = \ 



where B^ = L\(B L 2 . 
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Ciphertext only attack (particularly for mode II) could supposedly suggest 
possible interesting results. Indeed one could try to xor several ciphertext blocks 
together to eliminate the output blocks and try to recover at least the underlying 
plaintext. In fact, this gives only access to a modulo 2 sum of at least four 
plaintext blocks and then is of no use. 

More precisely, suppose that Bt, Bt+i, Bt+ 2 , Bt +3 are combined with plain- 
text blocks Mt, Mt+i, Mt+ 2 , Mt +3 producing ciphertexts blocks 



Ct = Bt (B Mt, Ct+i = Bt+i © Mt+i, Ct+2 ~ Bt+2 © Mt+2, Ct+3 = Bt+a © Mt+z- 

We know that Bt+j can be written as (if denotes internal states of 

register Li at time t + j): 

Bt+j = + 2^2 * (Lf+^')[0] © 4*^"^ [2]) 

+2^4 * (L^^) [3] ^ ^ 2^6 * (l4^^ [2] © [0]) (1) 

where j = 0, . . . ,3. Since crossing over gives (in first approach let us forget 
irregularly clocking of the registers) for register Lf. 

Lf+')[2] = Lf[0] ©f+i)[3] = Lf)[l] (2) 

at any instant time t, with Equations 1 and 2 we obviously have: 

Ct © Ct+i © Ct+2 ® Ct+3 = Mt © Mt+i © Mt+2 ® Mt+3 (3) 

Thus xoring together four ciphertext blocks eliminates the underlying output 
blocks, yielding modulo 2 sum of four plaintext blocks. This sum cannot in 
any way be used to retrieve any of the four plaintext blocks. In other words. 
Equation 3 shows that to recover one plaintext block, the cryptanalyst must 
guess or know the three other plaintext blocks, that is to say he has to know 
75% of the plaintext. When possible this situation makes encryption meaningless, 
above all for plaintext whose redundancy has been suppressed. 

By recalling that, in addition, registers are irregularly clocked, even such a 
sum is impossible to obtain. To resume, the crossing over mechanism greatly 
increases the security of the COS ciphers. 



3.3 Statistical Tests 

We have used the statistical tests defined by D.E. Knuth [14] to check random- 
ness of some COS Ciphers (mainly (2, 128) and (3, 128)). 

These tests are: frequency test, serial test, gap test, poker test, coupon col- 
lector’s test, permutation test, run test, serial correlation test. 

We did not notice any significant bias. Randomness results are rather very 
good. Output bits behave as coin-tossing experiment, independently from their 
neighbours. 
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4 Performance Analysis (Mode II) 

A (n, 2L) version of COS cipher will produce a 2(n — l)L-bit block at each it- 
eration (mode II) with only L register shifts. Hence COS ciphers are extremely 
fast ciphers. Performance analysis is presented in Table 2. It has been conducted 



Table 2. Approximative encryption speed of some COS ciphers 





Mean Encryption speed 


(2,128) 


120 Mbits/sec. 


(3,128) 


253 Mbits/sec. 


(4,128) 


330 Mbits/sec. 


(3,256) 


500 Mbits/sec. 


(2,512) 


510 Mbits/sec. 


(3,512) 


978 Mbits/sec. 



on a Pentium III with 500 MHz CPU, under Linux. Compiler was gcc-egcs- 
2.91.66 (egcs-1.1.2 release). Approximative mean encrypion speed (in Mbits/sec.) 
is given. Performance analysis has been conducted on an relatively optimized C 
implementation. Better results are bound to be obtained either with truly opti- 
mized C implementation or assembler implementation. Code size after complete 
compilation is less than 20 Kbytes. 

5 Conclusion and Challenge 

COS ciphers are new, ultrafast, (vectorized) stream ciphers built up entirely with 
non linear feedback shift registers. They offer a very high speed encryption and 
a very high cryptographic strength. They particularly exhibit extremely good 
randomness properties. 

However, since public evaluation only can confirm the very high quality of 
this cipher, we propose a 1000 euros cryptanalysis challenge with no time limit. 
The first who breaks the (2, 128) COS cipher wins the prize. Challenge rules can 
be found in [7]. We hope that it will promote leading research in NLFSR theory. 
These ciphers are placed under the GNU General Public Licence. 
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A C Implementation 

We give a readable but not optimized C code of the (2, 128) COS cipher. Func- 
tions /m, Jli and /lj and look-up table T (due to lack of space) will be found 
in [7]. 

void setkey (unsigned long int * LI, unsigned long int * L2) 

{ 

unsigned long int i,a,M[8] ,feed; 

/* M register common part initialization */ 

M[0] = Kl; M[l] = K2; M[2] = K3; M[3] = K4; 



/* M register user’s key dependent part initialization */ 
ifCKEYSIZE == 256) {M[4] = K5; M[5] = K6; M[6] = K7; M[7] = K8;> 
ifCKEYSIZE == 192) 

{ 

M[4] = K5; M[5] = K6; a = Kl ‘ K2 ‘ K3; 

M[6] = T[(a & OxFF)] I (T[((a » 8) & OxFF)] « 8); 

M[6] 1= (T[((a » 16) & OxFF)] « 16) I (T[a » 24] « 24); 

a = K4 ~ K5 ~ K6; 

M[7] = T[(a & OxFF)] I (T[((a » 8) & OxFF)] « 8); 

M[7] 1= (T[((a » 16) & OxFF)] « 16) I (T[a » 24] « 24); 

} 

ifCKEYSIZE == 128) 



M[4] 


= 


T[(K1 & 


OxFF)] 


1 


(T[((K1 » 8) 


& 


OxFF)] 


« 8) ; 






M[4] 


1 = 


(T[((K1 


» 16) 


& 


OxFF)] « 16) 


1 


(T[K1 


» 24] 


<< 


24) ; 


M[5] 


= 


T[(K2 & 


OxFF)] 


1 


(T[((K2 » 8) 


& 


OxFF)] 


« 8) ; 






M[5] 


1 = 


(T[((K2 


» 16) 


& 


OxFF)] « 16) 


1 


(T[K2 


» 24] 


<< 


24) ; 


M[6] 


= 


T[(K3 & 


OxFF)] 


1 


(T[((K3 » 8) 


& 


OxFF)] 


« 8) ; 






M[6] 


1 = 


(T[((K3 


» 16) 


& 


OxFF)] « 16) 


1 


(T[K3 


» 24] 


<< 


24) ; 


M[7] 


= 


T[(K4 & 


OxFF)] 


1 


(T[((K4 » 8) 


& 


OxFF)] 


« 8) ; 






M[7] 


1 = 


(T[((K4 


» 16) 


& 


OxFF)] « 16) 


1 


(T[K4 


» 24] 


<< 


24) ; 


M[0] 


-= 


MK; 




/* Message key 


introduction */ 







/* Shift M register 256 times */ 
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ford = 0;i < 256;i++) 

{ 

feed = ((M[0] & OxSOOOOOOOL) » 21); 
feed 1= ((M[0] & 0x8L) « 6); 
feed 1= ((M[l] & 0x200L) » 1); 
feed 1= ((M[2] & OxSOOL) » 4); 
feed 1= ((M[2] & 0x8L) « 3); 
feed 1= ((M[3] & 0x400000L) » 17); 
feed 1= ((M[4] & 0x80000L) » 15); 
feed 1= ((M[5] & 0x800000L) » 20); 
feed 1= ((M[6] & 0x2000000L) » 23); 
feed 1= ((M[7] & 0x80000000L) » 30); 
feed 1= ((M[7] & 0x4L) » 2); 

M[7] = (M[7] » 1) I ((M[6] & 1) « 31); 

M[6] = (M[6] » 1) I ((M[5] & 1) « 31); 

M[5] = (M[5] » 1) I ((M[4] &1) « 31); 

M[4] = (M[4] » 1) I ((M[3] &1) « 31); 

M[3] = (M[3] » 1) I ((M[2] &1) « 31); 

M[2] = (M[2] » 1) I ((M[l] &1) « 31); 

M[l] = (M[l] » 1) I ((M[0] &1) « 31); 

M[0] = (M[0] » 1) I (fM[feed] « 31); 

} 

*L1++ = M[4]; *L1++ = M[5]; *L1++ = M[6]; *L1 = M[7]; 

/* Clock M register 128 times */ 
for(i = 0;i < 128;i++) 

{ 

feed = ((M[0] & 0x80000000L) » 21); 
feed 1= ((M[0] & 0x8L) « 6); 
feed 1= ((M[l] & 0x200L) » 1); 
feed 1= ((M[2] & 0x800L) » 4); 
feed 1= ((M[2] & 0x8L) « 3); 
feed 1= ((M[3] & 0x400000L) » 17); 
feed 1= ((M[4] & 0x80000L) » 15); 
feed 1= ((M[5] & 0x800000L) » 20); 
feed 1= ((M[6] & 0x2000000L) » 23); 
feed 1= ((M[7] & 0x80000000L) » 30); 
feed 1= ((M[7] & 0x4L) » 2); 

M[7] = (M[7] » 1) I ((M[6] &1) « 31); 

M[6] = (M[6] » 1) I ((M[5] &1) « 31); 

M[5] = (M[5] » 1) I ((M[4] &1) « 31); 

M[4] = (M[4] » 1) I ((M[3] &1) « 31); 

M[3] = (M[3] » 1) I ((M[2] &1) « 31); 

M[2] = (M[2] » 1) I ((M[l] &1) « 31); 

M[l] = (M[l] » 1) I ((M[0] &1) « 31); 

M[0] = (M[0] » 1) I (fM[feed] « 31); 

} 

*L2++ = M[0]; *L2++ = M[l] ; *L2++ = M[2]; *L2 = M[3]; 

return; 
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void cos (unsigned long int * LI, unsigned long int * L2, 

unsigned long int * block, unsigned long int flag) 

{ 

unsigned long int av[4] = {64, 65, 66, 64}, elk, feed, i; 

elk = (LI [3] & 1) I ((L2[3] & 1) « 1); 
if (flag) { 

ford = 0L;i < av[clk];i++) 

{ 

feed = ((LI [3] & 0x4) » 2); 
feed 1= ((LI [3] & 0x20L) » 4); 
feed 1= ((LI [3] & OxlOOL) » 6); 
feed 1= ((LI [3] & OxSOOOL) » 12); 
feed 1= ((LI [3] & 0x4000000L) » 22); 
feed 1= ((LI [2] & 0x40L) » 1); 
feed 1= ((LI [2] & OxlOOOL) » 6); 
feed 1= ((LI [2] & OxSOOOL) » 8); 
feed 1= ((LI [2] & 0x2000000L) » 17); 

LI [3] = (LI [3] » 1) I ((LI [2] & 1) « 31); 

LI [2] = (LI [2] » 1) I ((Ll[l] & 1) « 31); 

Ll[l] = (Ll[l] » 1) I ((L1[0] & 1) « 31); 

L1[0] = (L1[0] » 1) I (fLl[feed] « 31); 

}} 

else { 

ford = 0L;i < av[clk];i++) 

{ 

feed = ((L2[3] k 0x4) » 2); 
feed 1= ((L2[3] k 0x20L) » 4); 
feed 1= ((L2[3] k OxlOOL) » 6); 
feed 1= ((L2[3] k OxSOOOL) » 12); 
feed 1= ((L2[3] k 0x4000000L) » 22); 
feed 1= ((L2[2] k 0x40L) » 1); 

feed 1= ((L2[2] k OxlOOOL) » 6); 

feed 1= ((L2[2] k OxSOOOL) » 8); 

feed 1= ((L2[2] k 0x2000000L) » 17); 

L2[3] = (L2[3] » 1) I ((L2[2] k 1) « 31); 

L2[2] = (L2[2] » 1) I ((L2[l] k 1) « 31); 

L2[l] = (L2[l] » 1) I ((L2[0] k 1) « 31); 

L2[0] = (L2[0] » 1) I (fL2[feed] « 31); 

}} 

*block++ = (L2[0] ‘ LI [2]); *block++ = (L2[l] * LI [3] ) ; 

*block++ = (L2[2] ‘ L1[0]); *block = (L2 [3] * Ll[l]); 

return; 

} 
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B General Description of the (2, 2L) COS Ciphers 




Ouput randonily chosen blocks (mode I) 
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Abstract. This paper specializes the signature forgery by Coron, Nac- 
cache and Stern (1999) to Rabin-type systems. We present a variation 
in which the adversary may derive the private keys and thereby forge 
the signature on any chosen message. Further, we demonstrate that, 
contrary to the RSA, the use of larger (even) public exponents does 
not reduce the complexity of the forgery. Finally, we show that our 
technique is very general and applies to any Rabin-type system designed 
in a unique factorization domain, including the Williams’ M® scheme 
(1986), the cubic schemes of Loxton et al. (1992) and of Scheidler 
(1998), and the cyclotomic schemes (1995). 

Keywords. Rabin-type systems, digital signatures, signature forgeries, 
factorization. 



1 Introduction 

In this paper, we specialize the signature forgery of Coron, Naccache and Stern [8] 
to Rabin-type systems [19,24]. We present a variation in which the adversary 
may derive the private keys and thereby forge the signature on any chosen 
message. Further, we demonstrate that, contrary to the RSA, systems using 
larger (even) public exponents are equally susceptible to the presented forgery. 
We also show that our technique is very general and applies to any Rabin-type 
systems designed in a unique factorization domain, including the Williams’ 
scheme [27], the cubic schemes of Loxton et al. [16] and of Scheidler [20], and 
the cyclotomic schemes [21]. 

As an application, we analyze the implications of our forgery against the 
PKCS #1 standard [4] . Finally, and of independent interest, we propose a generic 
technique (i.e., applicable to any encoding message method) that reduces the 
overall complexity of a forgery from n to ^/n. 

* A working draft of this work was presented at the ISO/IEC JTC1/SC27/WG2 mee- 
ting in August 1999. 



B. Honary (Ed.): Cryptography and Coding 2001, LNCS 2260, pp. 99—113, 2001. 
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The rest of this paper is organized as follows. We first begin by a brief 
presentation of Rabin-type systems. Next, in Section 3, we review the Coron- 
Naccache-Stern forgery and turn it into an universal forgery against Rabin-type 
signature schemes. In Section 4, we generalize the forgery to higher exponents 
and higher degree schemes. We apply it to PKCS#1 encoding method in Sec- 
tion 5. We also present a generic technique for reducing the complexity of the 
forgery. Finally, we conclude in Section 6. 



2 Rabin- Type Systems 

In this section, following the IEEE/P1363 specifications for public-key cryp- 
tography [2] (see also [17, Chapter 11]), we present a modified version of the 
Rabin- Williams signature scheme [19,24]. The scheme consists of three algo- 
rithms: the setup, the signature and the verification. For setting up the 
system, each user generates a pair of public/private keys. The private key is 
used to sign messages with the signature algorithm. Using the corresponding 
public key, a signature can then be verified and the signed message recovered 
with the verification algorithm. 

setup : Generate two primes p, q such that p = 3 (mod 8) and q = 7 (mod 8) 
and compute n = pq. Define an appropriate “representation” function R : 
j\4 —1 j\4ji : m 1 —^ fh = R(m), where A4 is the set of valid messages and 
■Mr = {w = R(m) G : fh = 6 (mod 16)} is the set of message 

representatives. The public key is n and the private key is d = {n—p—q+5)/8. 
signature: Compute fh = R{m) and m given by 

^ _ Jm mod n if {fh\n) = 1 
}fh/2 mod n if (m|n) = — 1 

The signature on message m is s = ffi'^ mod n. 
verification: Compute m! = mod n. Then, take 



m' 


if m' = 6 


(mod 8) 


2m! 


if m' = 3 


(mod 8) 


n — m' 


if m' = 7 


(mod 8) 


2(n — m') 


if to' = 2 


(mod 8) 



If m € Mr then the signature is accepted and message m is recovered from 
fh. 



Remark 1. A signature scheme with appendix can also be defined along these 
lines. In that case, the set of messages representatives is given by Mr = {fh £ 
{llnlj)* : m = 10 (mod 16)}. 
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Remark 2. The recommended function R for message encoding is the interna- 
tional standard ISO/IEC 9796 [3]. Another possible encoding is specified in 
PKCS#1 v2.0 [4]; this latter, however, only covers signature schemes with ap- 
pendix. 



Remark 3. The presented scheme supposes p = 3 (mod 8) and q = 7 (mod 8). 
Similar schemes with form- free primes may be found in [26,13]. 

Noticing that 2d = (p— l)(q— l)/4+ 1, the correctness of the method follows 
from the next lemma; moreover, it uses the fact that if n = 5 (mod 8) then 
(— zjn) = (z\n) = — (2z|n). 

Lemma 1. Let n = pq, where p, q are distinct primes and p,q = 3 (mod 4). If 
{z\n) = 1, then 

(mod n) . 

Proof. See [24, Lemma 1]. □ 

3 Signature Forgeries 

This section reviews the Coron-Naccache-Stern forgery [8] when applied to the 
Rabin- Williams scheme. In the second part, we modify it into an universal 
forgery so that the signature on any message can be obtained without know- 
ing the private key. 

3.1 Coron-Naccache-Stern Forgery 

As aforementioned, for each message representative fhi, rhi = fhi if (fhi\n) = 1 
and fhi = fhi/2 if (fhi\n) = —1; the corresponding signature is then given by 
Si = fhi^ mod n. (Note here that {fhi\n) = 1.) 

Suppose that an adversary has collected several pairs {fhi,Si) such that the 
rhi’s are smooth (modulo n). More precisely, suppose she knows 

m* = (-1)’'“'* pj^’r' (mod n) , (1) 

1<3<B 



where p\ < ■ ■ ■ < pb are prime and vj^i G Z, and Si = mod n, for 1 < t < f. 
Then she can forge the signature on a message to,-, provided that fhr is smooth 
(modulo n), as follows.^ 

^ It is here essential to note that the smoothness requirement must only be satisfied 
modulo n. For example, 197 is prime in Z but is 3-smooth as an element of (Z/437Z), 
i.e., 197 = 2®-3“® (mod 437). 

^ In [8], the anthors only consider positive “messages” fhi. We slightly generalize their 
presentation by introducing the term (—1)'"“’* in Eq. (1). 
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To each fhi, she associates the i?-tuple Vi given by V = Let 

Vr denote the S-tuple corresponding to fhr- If there exist integers (3i such that 
Vr = Y.i<i<i A (mod 4), then 

Vj,r = Pi Uj,i - ‘ijj (1 < j < B) (2) 

i<i<e 

for some 'jj G Z. 

Hence, the signature on message m-r can be expressed as 
^ r -i-r 

Sr '■= fh-T mod n = (—1)"°’^ (from Eq. (1)) 

l<j<B 

= (since d is even) 

l<j<B 

= n n (fromEq.(2)) 

l<i<e l<j<B 

= s/* (mod n) . (3) 

i<i<e i<i<s 

The only difference with [8] is that we use the weaker relation 
(mod n). (The relation = ±pj (mod n) only holds when {pj\n) = 1; see 
Lemma 1.) 

3.2 Universal Forgery 

We will now show that we can do much better than forging the signature on 
a smooth fhr, namely, forging the signature on any chosen fhr- We need the 
following proposition. 

Proposition 1. Let n = pq, where p, q are distinct primes and p,q = 3 (mod 4) 
and let d = (n — p — q + 5)/8. If {z\n) = —1, then 

gcd(z^'^ =F z (mod n),n) = p or q . 

Proof. Since p,q = 3 (mod 4), both (p — l)/2 and {q — l)/2 are odd. Hence, 
^ 2 d ^ ^ ^ (z|p)(^“^L2 2 = (z|p) z (mod p), and similarly z^"^ = 

(z\q) z (mod q). Noting that (z|p) = —{z\q) = ±1, the proposition is proved. □ 

The above proposition suggests that if an adversary can derive the signature 
on an fhr such that (fhr\n) = — 1, then she can factor the modulus by computing 
gcd(si-^ — fhr (mod n),n). This, however, is not possible. Define 

J {n, B) = < j < B : pj is prime and = ~l| • 



(4) 
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Since {mi\n) = 1 (cf. beginning of § 3.1), we must have 

Vj^i = 0 (mod 2) . (5) 

So, any linear combination, as done in Eq. (2), will always yield b) = 

0 (mod 2), resulting in (m,-|n) = 1. 

But there is another way to consider Proposition 1: If the adversary is able 
to obtain the signature on fhr = ±fv^ for some ?> such that (7v|n) = — 1 (note 
here that (mr|n) = (±l|n) • = 1 • 1 = 1)) then 

gcd(sr — rv (modn),n) (6) 

will give a non-trivial factor of n. To make this feasible, in addition to verify 
Eq. (2), the components of Vr must satisfy 

Vj^r = 0 (mod 2) for all 1 < j < B (7) 

and 

Vj^r = 2 (mod 4) . (8) 

j&J{n,B) 

The first condition ensures that the “message” corresponding to is a square 
(i.e., Thr = ±fv^), while the second condition ensures that (fv|n) = —1. Replac- 
ing Vj^T by Eq. (2), Eqs (7) and (8) can respectively be rewritten as 

Pi Vj^i = 0 (mod 2) for all 1 < j < R (9) 

i<i<e 

and, defining 2^ = iY.jej(n,B) 4 (g {0, 2} from Eq. (5)), 

A Vj^i = y^ /3i y^ vj^i = y^ /?* 2c* = 2 (mod 4) 

j^J{n,B) l<i<i l<i<i j&J(n,B) l<i<^ 

X! = ^ (mod 2) . (10) 

i<i<e 



To sum up, an adversary can recover the secret factorization of n by carrying 
out the following steps: 



(I) For 1 < i < ^, write irii = (— 1)’'°'-’ (mod n) ; 



(II) Define Vi = (t>y*, . . . ,vb,i) and Q 

(III) Find Pi,.. . ,Pe such that 



l<j<B 

_ (Ejg ^i.i) mod 4 ^ 



a) for 1 < j < R, y^ Pi Vj^i = 0 (mod 2) ; 

Ki<i 



2 
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(IV) 

(V) 

(VI) 

(VII) 



b) E 

l<i<t 

Compute Vr - 



Set rv = 



= I (mod 2) ; 

= , Vb,t) G {ZI4Z)^, 

where Vj^r = A ~ 4:1 j for some G Z ; 
i<i<e 

p-^3.-^/'^ (mod n ) ; 



l<j<B 



Compute Sr = mod n ; 

i<i<e l<j<B 

Recover the factors of n by computing gcd(sr — fv (mod n),n) . 



Example 1. Here is a “toy” example to illustrate the forgery. Let p = 8731 (= 3 
(mod 8)) and q = 3079 (= 7 (mod 8)) yielding a modulus n = 26882749. So, 
the private exponent is given by d = 3358868. 

We consider p 4 -smooth message representatives mi G Mr. We have J{n,A) = 
{2,7}. Suppose, we are given: 



rrii 


fhi (mod n) 


St 


Vt 


G 


70 


70 (= 2 • 5 • 7) 


8417525 


(1,0, 1,1) 


1 


294 


147 (= 3 • 7A 


11480098 


(0, 1,0,2) 


1 


486 


243 (= 3A 


16287310 


(0,5, 0,0) 


0 


630 


630 (= 2 • 3fo 5 • 7) 


1630174 


(1, 2,1,1) 


1 



Conditions (Ill-a) and (Ill-b) yield 



/3i + /?4 = 0 (mod 2) 

/?2 + /?3 = 0 (mod 2) 

/?! + A 4“ A = 1 (mod 2) 



f/3i = A (mod 2) 

\A = a = 1 (mod 2) 



Taking A = A = 0 ^md A = A = Ij we have A = (0, 2, 0, 2), which corresponds 
to fhr = 3^ • 7^ = 21^ whose signature is given by Sr = S 2 ^ ss^ 3“^ mod n = 
8076196. So, the factorization of n is obtained by computing gcd(8076196 — 
21,n) = 8731 (=p). 0 

We will see below (Algorithm 1) a simple method to compute a solution 
(Aj-- - ;A) G (Z/2Z)^. This is a slight modification of Algorithm 2.3.1 in [7, 
pp. 56-57] (see also [12, Algorithm N, pp. 425-426]). 

Using matrix notations. Conditions (Ill-a) and (Ill-b) can be rewritten as 



/ 1’1,1 ■ 


• 0\ 




VB,1 ■ 


• VB.t 0 


A 


VCi • 


• 0 1/ 


VI/ 



(mod2) 

:=U 



(mod 2) . 



( 11 ) 
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So, the problem is reduced to find a vector K = (ki, ... , k^+i) G kerU 

(i.e., the kernel of matrix U), whose last coordinate, k^+i, is equal to 1 . In that 
case, we have Pi = Ki, 1 < i < £. 

Algorithm 1. This algorithm computes a solution (if any) Pi, . . . ,Pi satisfying 
Eq. ( 11 ). We let Uj-^c denote the entry (modulo 2 ) at row r and column c of 
matrix U. 

1 . [initialization] Set c 1 and for 1 < j < B -|- 1 , set tj ^ 0. 

2. [scanning] If there is some r in the range 1 < r < B -f 1 such that Ur^c = 1 and 

tr = 0, then go to Step 3. Otherwise, go to Step 5. 

3. [elimination] For all j yf r, if Uj^c = 1. then add (modulo 2) row r to row j. Set 

tj- i — C. 

4. [loop] If c < t', then set c ^ c + I and go to Step 2. 

5. [kernel] Evaluate the vector K = (ki, . . . , Ki+i) defined by 

{ Uj^c tj = i > 0 
1 if i = c 

0 otherwise 

If Ke+i = 0 and c < £, then set c ^ c -F 1 and go to Step 2. 

6. [output] If Ki+i = 1, then output Pi Ki for all 1 < t < £; otherwise, output 

no solution. 

Example 1 (cant’d). If we apply the previous algorithm to Example 1 , matrix U 
is given by 

/0 0 0 1 0\ 

0 110 0 

U = 1 0 0 10 . 

10 0 10 

\1 1 0 1 1/ 

We then successively obtain for c = 1 , 2 , 3 

/0 0 0 1 0\ /0 0 0 1 0\ /0 0 0 1 0\ 

O 01 OO O 01 OO 00001 

u=0 0000,0 0 000,0 0 000 

00000 00000 00000 

\0 1 0 0 1 / Vo 0 0 0 1/ \0 0 0 0 1/ 

after the elimination step. We also have t\ = 1 , t2 = 2 , ts = 3 (which is indicated 
by the boxes (tr = c)) and t^ = t^ = 0 . At this point, we have c = 4 and the kernel 
step yields the vector K = (mi,4, M2,4, ^5,4, 1 , 0 ) = ( 1 , 0 , 0 , 1 , 0 ). Since K5 = 0 , we 
increment c, c = 5 , and go to the scanning step. Then, since ^2 = ts = 0 
(note that Uc,2 = = 1), we directly go to the kernel step and obtain the 

new vector K = (ui.5, rt2,5, u-s.s, 0 , 1 ) = ( 0 , 1 , 1 , 0 , 1 ). So, we finally find Pi = 0 , 
P 2 = ^, Pi = ^ and /?4 = 0. 0 
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3.3 Improvements 

The methods we have presented so far are subject to numerous possible improve- 
ments. We just mention two of these. 

In Eq. (1), we consider only “messages” fhi whose largest prime factor (mod- 
ulo n) is pb- As for modern factorization methods, a substantial speed-up can 
be obtained by also considering the rhi’s which are ps-smooth except for one or 
two factors [15]. Another speed-up can be obtained by using structured Gaussian 
elimination to solve Eq. (11); see [18] for an efficient variation directly applicable 
to our case. 

4 Generalizations 

4.1 Higher Exponents 

The signature scheme presented in Section 2 can be generalized to other even 
public exponents besides e = 2. Define A = lcm[(p — l)/2, {q — l)/2]. It suffices 
to choose e relatively prime to A, the corresponding private exponent d is then 
given according to ed = 1 (mod A) (see [24]). The scheme remains exactly the 
same except that m' = mod n must be replaced by m' = mod n in the 
verification stage. 

In that setting, Proposition 1 becomes 

Proposition 2. Let n = pq, where p, q are distinct primes and p,q = 3 (mod 4) 
and let e, d such that e is even, gcd(e. A) = 1 and ed=l (mod A). If{z\n) = —1, 
then 

gcd[z^'^ =F (mod n),n) = p or q . 

Proof. From ed = 1 (mod A), we deduce that ed = 1 (mod {p — l)/2) and so 
there exists 7 G Z such that ed = 7^^ + 1. Further, since p = 3 (mod 4), 
(p — l)/2 is odd. Hence, 7 must be odd since ed is even. Consequently, = 
2;7 (p-i)/ 2^ = (z\py z = (z\p) z = ±z (mod p) and similarly = {z\q) z = 
— (z|p) z = =pz (mod q), which completes the proof. □ 

Since e is even, we can write e = 2ei. It is here worth remarking that anyone 
can raise an element to the ei*** power (modulo n). So, if an adversary follows 
Steps (I)~(VI) as described in Section 3, she can recover the factors of n by 
computing 

gcd(S'T — TV (modn),n), (12) 

where 

Sr := mod n = (m,‘^)®i = H II 

l<i<l l<j<B 

= II II (mod n) . 

l<i<^ l<j<B 

The forgery is thus no more expensive against a scheme with a large public 
exponent e than against the basic scheme with e = 2. 
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4.2 Higher Degree Schemes 

Rabin-type schemes can be developed in any unique factorization domain. In [27], 
Williams presents a scheme, called with public exponent 3 using arithmetic 
in a quadratic number field. This scheme was later extended to cyclotomic fields 
by Scheidler and Williams in [21] where they also give a scheme with a public 
exponent 5. In [20], Scheidler modifies Williams’ scheme so that it works 
with a larger class of primes. Finally, in [16], Loxton et al. give another cubic 
scheme; the main difference with [27] being its easy geometrical interpretation. 
In this paragraph, we will stick on this latter scheme because it most resembles 
the Rabin- Williams scheme presented in Section 2. 

The scheme of Loxton et al. uses the ring of Eisenstein integers, namely Z[a;], 
where uj = (— 1 + V— 3) /2 is a primitive cube root of unity. Its correctness relies 
of the following lemma (compare it with Lemma 1). 

Lemma 2. Let n = pq, where p,q are distinct primes in Z[uj], 3 | Nn and 
Ng = 2Np— 1 (mod 9). If (z\n)^ = 1, then 

^(Np-l)(Ng-l)/9 ^ 

Proof. See [16, Lemma 1]. □ 

Essentially, that scheme suggests to generate two primes p,q £ Z[w] such 
that p = 8 + 6lo (mod 9) and q = 5 + 6uj (mod 9) and to compute n = pq. The 
public key is n and the private key is d = [(Np — 1)(N(7 — 1) -|- 9]/27, where 
N denotes the norm. A message m is signed by computing fh = R{m) (for an 
appropriate function R) and m = (1 — [(1 — co)m + 1], where uj* = (m|n) 3 ; 

the signature is s = fiV^ mod n. The signature is then verified by cubing s, and 
if accepted, the message m is recovered. 

We need an analogue of Proposition 1. 

Proposition 3. Let n = pq, where p,q are primes in Np = 7 (mod 9) 
and N (7 = 4 (mod 9), and let d = [(Np— 1)(N(7 — 1) + 9]/27. If {z\n\ = uj or 
then 

gcd(z^‘’* — uj^z (mod n), n) = p or q , 
for some k € {0, 1, 2}. 

Proof We first note that (Np — l)/3 = 2 (mod 3) and (Nq — l)/3 = 1 (mod 3). 
So, z^'^ = 2(Np-i)(Ng-i)/9^ = z = {z\p\ z (mod p), and similarly 

z3d ^ z = (zlq)^"^ z (mod q). The proposition now follows by ob- 

serving that (z\p)^ yf {z\q\ because, by hypothesis, {z\n\ = {z\p\ {z\q\ = w or 
a;2. □ 
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From this proposition, we can mimic the forgery presented against the Rabin- 
Williams scheme. The adversary has to find ,/3^ such that (a) for 1 < 

j < B, = 0 (mod 3); and (b) = 1.2 (mod 3), where 

30 := 9 and J{n, B) := {I < j < B : pj is prime in Z[w] 

and {pj\n)^ = w or uj^}. She then computes Vr = (■ci,t.-- - ,vb,t) G (Z/9Z)'®, 
where Vj^r = ~ (for some G Z), ?r = 

(mod n) and Sr = rii<i<^ Wi<j<BPi~^'^^ mod n. Finally, by computing 

gcd(sr— (modn),n) (13) 

for some k G {0, 1, 2}, she finds the factors of n. 

5 Applications 

As an application, we will analyze the consequences of the previously described 
forgeries (Section 3) when the Rabin-Williams signature scheme is employed 
with the PKCS#1 v2.0 message encoding method as specified in [4]. We note, 
however, that the PKCS^l standard is not expressly intended for use with the 
Rabin-Williams scheme but rather with the plain RSA scheme. 

In the second part, we will present an algorithm which reduces the complexity 
of the forgery from n to ^/n. This algorithm is not restricted to PKCS#1: it 
remains applicable whatever the employed encoding message method. 

5.1 PKCS #1 Encoding Method 

We only briefly review the PKCS#1 message encoding method and refer the 
reader to [4] for details. PKCS#1 supports signature schemes with appendix. 
(In such a scheme, the message must accompany the signature in order to verify 
the validity of the signature.) The set of message representatives is thus given 
by Mr = {fh G {Ijlnlj)* : m = 10 (mod 16)} (cf. Remark 1). 

Let m be the message being encoded into the message representative m. 
First, a hash function is applied to m to produce the hash value H = Hash(m). 
Next, the hash algorithm identifier and the hash value H are combined into 
an ASN.l value and DER-encoded (see [11] for the relevant definitions). Let T 
denote the resulting DER-encoding. Similarly to what is done in ISO 9796 [3], 
we concatenate the octet OAie to obtain the data string D = TjjOAis (the reason 
is to ensure that D, viewed as an integer, is congruent to 10 modulo 16). The 
encoded message EM is then formed by concatenating the block- type octet BT, 
the padding string PS, the OOis octet and the data string D] or schematically, 

EM = 00i6||RT||P5||00i6||I^, (14) 

where BT is a single octet containing the value OOis or Olis. Let ||n|| and ||D|| 
respectively denote the octet-length of modulus n and D. When BT = OOis then 
PS consists of ||n|| — ||D|| — 3 octets having value OOis; when BT = Olis then PS 
consists of ||n|| — ||D|| — 3 octets having value FFis. 

The message representative fh is the integer representation of EM . 
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Remark 4- Three hash functions are recommended: SHA-1 [1], MD2 [5], and 
MD5 [6]. The default function is SHA-1; MD2 and MD5 are only recommended 
for compatibility with existing applications based on PKCS^l vl.5. 

In what follows, we will assume that SHA-1 is the used hash function. In 
that case, the data string D consists of (15 -I- 20 -I- 1) = 36 octets (= 288 bits). 



Type 0 encoding. With type 0, i.e., when BT = OOie, the message representa- 
tives, rhi, are 288-bit long, whatever the size of the modulus. We have seen that 
the effectiveness of our forgeries is related to the smoothness (modulo n) of the 
fhiS that appear in Eq. (1), where rhi = or rhil2 according to the value of 
{rhi\n). So, when BT = OOie, each rhi is at most a 288-bit integer and we can 
hope that they factor (in Z) into small primes. 



Type 1 encoding. Type 1 encoding, i.e., BT = Olis, is the recommended way 
to encode a message. In that case, the message representatives are longer. For a 
1024-bit modulus, these are 1016-bit integers (the leading octet in EM is OOis). 
Therefore, the rhi’s happen unlikely to be smooth. But, what we ultimately need 
is to find smooth rhi’s modulo n, that is, we need to find an Mi = rhi (mod n) 
so that Mi is smooth as an element in Z/nZ (cf. Footnote (^)). As already noted 
in [8], if the 1024-bit modulus has the special form 

n = 2^°^^±t, (15) 

then it is easy to find an Mi whose magnitude is comparable to that of t. Indeed, 
when BT — Olis, the padding string is formed with (128 — 36 — 3) = 89 octets 
having value FFis. Therefore, considering the data string Di as a 288-bit integer, 
we can write from Eq. (14), rhi = {(2®)*^ + [(2®)*^ — 1]}(2®)^"^ + Di. Hence, 
rhi = (2^^^ — 1)2^®® + Di or (2"^^^ — 1)2^®® + Di/2 according to {rhi\ri) = 1 or 
— 1. So, defining Mi := 2®* rhi — n and setting Mi = 2“®* Mi = rhi (mod n), an 
appropriate choice for gi will remove the term (resp. 2"^^®+^®®). Namely, 

we set 



_ \‘^^'^fhi-n = 2^'^ Di-2^^° zs^t if (mi|n) = 1 , . 

\2i®mi-n = 2i®A-23i®Tt if (m,|n) = -1 ’ 

Hence, since a special modulus of the form (15) with a square-free t as small 
as 400 bits offers the same security as a regular 1024-bit modulus [14], the Mfs 
as given in Eq. (16) can already be as small as 400-bit integers. Consequently, 
hoping that the Mj’s factor into small integers (in Z),® Mi = 2“®* Mi = rhi 
(mod n) will be smooth as elements of Z/nZ. 

® The probability that a 400-bit integer is smooth is relatively small; but see § 3.3 for 
some possible alternatives. 
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5.2 Arbitrary Encoding 

For general (“form-free”) message encoding methods and moduli, the message 
representatives have roughly the same length as the modulus. We now present 
a generic method which on input an arbitrary message representative outputs a 
“message equivalent” whose length has the size of -/n. 

As in [8] , we observe that if we can find integers at and bi so that Oi is smooth 
and 



:= Oi rhi- bill (17) 

is smooth as well, then Mi := ai~^ Mi = fhi (mod n) is also smooth as an 
element of llriL. 

Explicitly, let Oj = (-1)““'' rii<j<B M, = (-1)“°-' rii<j<B be 

the prime factorizations of Qi and Mi] then, we have 

Mi = fhi = (-1)”°'* (mod n) , (18) 

l<j<B 

where Wjy = — rujy, as required. 

A related problem has been addressed by de Jonge and Chaum in [9, §3.1] 
(see also [10]): they describe a method to find small integers Oi and Mi satisfying 
Eq. (17). The “Pigeonhole Principle” (e.g., see [22, Chapter 30]) quantifies how 
small can be and Mi. 

Proposition 4 (Pigeonhole Principle). Let two integers n,rh G Z. For any 

positive integer A < n, there exist integers a* and b* such that 



0 < |a*| < A and \a*m — b*n\ < ]"n/A] . 



Proof. Consider the (A -|- 1) “pigeons” given by the numbers Pa ■= am mod n 
(0 < a < A), i.e., each pigeon is an integer between 0 and {n — 1). We now form 
the A pigeonholes given by the integer intervals 



Ta 



[(a — 1) ]"n/A] , a]"n/A] [ for 1 < a < (A — 1) 
[(A — 1) [n/A] , n — l] for a = A 



Since there are more pigeons than pigeonholes, two pigeons are sitting in the same 
hole. Suppose these are pigeons Px and Py and that they are in hole la', in other 
words, Px,Py GTa \Px-Py\ < \n/A~\ \{x - y)fh mod n\ < \n/A~\. 

Noting that d < x,y < A and x ^ y, we set a* = x — y and so 0 < |a*| < A. 
Hence, |a*m mod n| < [n/A] and the lemma follows by setting b* = [a*m/nj. 

□ 



In particular, taking A = \y/n^, there exist ai,bi Gl, such that [oi] < \\/ri~\ 
and \Mi\ < [n/f-v/n]] — 1 = [n/]"-yn]J < [\/nJ • This means that for a given 
1024-bit modulus n and any fhi (corresponding to the message representative 
m)j), there are integers and bi such both and Mi = Ui fhi — biU are 512-bit 
integers, whatever the message encoding method. It remains to explain how to 
find Qi and hi. A simple means is given by the extended Euclidean algorithm [12, 
Algorithm X, p. 325]. 
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Algorithm 2. On inputs n and fhi € TjInL, this algorithm computes a solution 
Qi,bi,Mi satisfying Eq. (17) so that |ai| and \Mi\ are < \^/n~\. It makes use of a 
vector (mi, W 2 , M 3 ) such that Ui fhi — U 2 n = U 3 always holds. 

1 . [initialization] Set {ui,U2,us) ^ (0, — l,n) and (ui,U2,M3) ^ (l,0,mi). 

2 . [Euclid] Compute Q ^ LM3MJ and {h, 12,13) ^ (ui,U2,U3) - {vi,V2,V3)Q. 

Then set (ui,U2,U3) (ui,U2,U3) and (vi,V2,V3) ■h- (ti,t2,t3). 

3 . [loop] If M3 > \^/n), then return to Step 2 . 

4 . [output] Output tti ^ ui, bi ^ U2 and Mi ^ M3. 



Example 2. Using the modulus of Example 1 (i.e., n = 26882749), suppose we 
are given a message mi whose encoding is fhi = 26543210 (= 10 (mod 16)). 
Since {fhi\n) = —1, we have fhi = fhi/2 = 13271605 = 5 • 79 • 33599. This fhi 
is not smooth, we thus apply Algorithm 2 and obtain Mi = aifhi + bin with 
Ui = 1821, bi = 899, Mi = 1354. We have = 3 • 607 and Mj = 2 • 677. Hence, 
fhi = 2 ■ 3“^ • 607“^ • 677 (mod n). 



Ml (ai) 


M2 {bi) 


U 3 {Mi) 


1 


0 


13271605 


-2 


-1 


339539 


79 


39 


29584 


-871 


-430 


14115 


1821 


899 


1354 


-19081 


-9420 


575 


39983 


19739 


204 


-99047 


-48898 


167 


139030 


68637 


37 


-655167 


-323446 


19 


794197 


392083 


18 


-1449364 


-715529 


1 



Note that Algorithm 2 does not always give the best possible solution. Consid- 
ering all the steps of the extended Euclidean algorithm (see above), the best 
solution for our example is given by ai = 79 and Mi = 29584 = 2^ • 43^ which 
yields fhi = 2'^- 43^ • 79“^ (mod n). <C> 

6 Conclusion 



This paper presented a specialized version of the Coron-Naccache-Stern signature 
forgery. It applies to any Rabin-type signature scheme and to any (even) public 
verification exponent. Furthermore, contrary to the case of RSA, the forgery is 
universal: it yields the value of the private key. 
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Abstract. This paper presents a new type of powerful cryptanalytic at- 
tacks on public-key cryptosystems, extending the more commonly stud- 
ied adaptive chosen-ciphertext attacks. In the new attacks, an adversary 
is not only allowed to submit to a decryption oracle (valid or invalid) 
ciphertexts of her choice, but also to emit a “dump query” prior to the 
completion of a decryption operation. The dump query returns interme- 
diate results that have not been erased in the course of the decryption 
operation, whereby allowing the adversary to gain vital advantages in 
breaking the cryptosystem. 

We believe that the new attack model approximates more closely 
existing security systems. We examine its power by demonstrating that 
most existing public-key cryptosystems, including OAEP-RSA, are 
vulnerable to our extended attacks. 

Keywords. Encryption, provable security, chosen-ciphertext security, 
ciphertext validity, OAEP-RSA, ElGamal encryption. 
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1 Introduction 

Provably security is gaining more and more popularity. Only known to theoreti- 
cians a decade ago, provable security becomes now a standard attribute of any 
cryptographic scheme. 

A scheme is said provably secure if the insecurity of the scheme implies that 
some widely-believed intractable problem is solvable. The security of a scheme 
is measured as the ability to resist an adversarial goal in a given adversarial 
model. The standard security notion for public-key encryption schemes is indis- 
tinguishability under adaptive chosen-ciphertext attacks (IND-CCA2) [2]. 

This paper introduces a stronger attack. In addition to having access to a de- 
cryption oracle, the adversary can perform a “memory dump’’’’ of the decryption 
oracle at any time; the sole restriction is that secret data (e.g., private keys) are 
inaccessible. We believe that this new attack model approximates more closely 
existing security systems, many of which are built on such operating systems as 
Unix and Windows where a reasonably privileged user can interrupt the opera- 
tion of a computing process and inspect its intermediate results at ease. 

We show that many IND-CCA2 encryption schemes are vulnerable within 
our extended attack scenario. Examples include OAEP-RSA and several ElGa- 
mal variants. A problem in almost all those schemes is that the validity of the 
ciphertext cannot be checked until after all the decryption is completed. By 
emitting a dump query prior to the validity checking, the adversary may get ac- 
cess to some secret information resulting from the raw decryption of an (invalid) 
ciphertext and thereby may weaken the scheme. 

We hope that in the future cryptographers will analyze the security of their 
schemes in our extended setting. We note that the scope of our attacks is very 
broad and may apply to other cryptographic primitives as well (e.g., digital sig- 
natures). In the same vein as in [2], it may also be interesting to see what are the 
different implications and separations implied by our attacks when considering 
various adversarial goals. 

Organization. The rest of this paper is organized as follows. In the next section, 
we review current security notions for encryption schemes and introduce the new 
notion of strong chosen-ciphertext security. Applicative examples of our attack 
scenario are also provided. Next, in Section 3, we apply our attack model to the 
celebrated OAEP-RSA and show that it is insecure under our extended setting. 
Similar attacks against various ElGamal variants are also presented. Finally, we 
conclude in Section 4. 

2 Strong Adaptive Chosen-Ciphertext Secnrity 

2.1 Security Notions 

Indistinguishability of encryptions, defined by Goldwasser and Micali [14], cap- 
tures the intuition that an adversary should not be able to obtain any partial 
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information (but its length) about a message given its encryption. In their pa- 
per, Goldwasser and Micali study the case of completely passive adversaries, 
i.e., adversaries can only eavesdrop.^ They do not consider adversaries injecting 
messages into a network or otherwise influencing the behavior of parties in the 
network. 

To deal with active attacks, Naor and Yung [18] introduced security against 
chosen-ciphertext attacks. They model such an attack by allowing the adversary 
to get access to a “decryption oracle” and obtain decryptions of her choice. When 
the security goal is indistinguishability, the attack of [18] runs as follows. During 
the first stage, called find stage, the adversary (viewed as a polynomial-time 
machine) has access to a decryption oracle. At the end of the stage, the adversary 
generates two (equal-length) plaintexts mo and mi. In the second stage, called 
guess stage, the adversary receives the encryption of rrih, say Cf,, with b randomly 
drawn from {0, 1}. The attack is successful if the adversary recovers the value 
b or, equivalently, if the adversary distinguishes the encryption of mg from that 
of mi. 

Rackoff and Simon [21] later generalized this attack by allowing the adversary 
to have still access to the decryption oracle after having obtained the target 
ciphertext Cb- Since the adversary could simply submit the target ciphertext itself 
to the decryption oracle, Rackoff and Simon restrict the adversary’s behavior by 
not allowing her to probe the decryption oracle with C{,. 

In summary, chosen-ciphertext attacks can be classified in two categories: 

~ {Static Chosen-Ciphertext Attacks [18]) The adversary has access to the de- 
cryption oracle uniquely prior to obtaining the target ciphertext.^ 

— {Adaptive Chosen-Ciphertext Attacks [21]) Not only can the adversary get 
access to the decryption oracle during the find stage but also during the 
guess stage. The only restriction is not to submit the target ciphertext itself 
to the decryption oracle. 

As explicitly pointed out in [16], we stress that the adversary may query the 
decryption oracle with invalid ciphertexts. Although seemingly useless, such at- 
tacks are not innocuous. The decryption oracle can for example be used to learn 
whether a chosen ciphertext is valid or not. From this single bit of information 
and by iterating the process, Bleichenbacher successfully attacked several im- 
plementations of protocols based on PKCS #1 vl.5 [5]. More recently. Manger 
pointed out the importance of preventing an attacker from distinguishing be- 
tween rejections at the various steps of the decryption algorithm, say, using tim- 
ing analysis [17]. The lesson is that implementors must ensure that the reasons 
for which a ciphertext is rejected are hidden from the outside world. 

^ We note, however, that in the public-key setting, an adversary can always mount a 
chosen-plaintext attack since encryption is public, by dehnition. 

^ In the past, this attack has also been called “lunch-time attack” or “midnight at- 
tack” . 
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2.2 Chosen- Ciphertext Attacks with Memory Dump 

Now we consider more powerful adversaries who not only can obtain the plain- 
texts corresponding to chosen ciphertexts, but can also invade a user’s computer 
and read the contents of its memory (i.e., non-erased internal data). In [24], 
Shoup introduced the strong adaptive corruption model, i.e., a memory dump 
attack combined with forward secrecy in the context of key exchange protocols. 
We apply this notion to the security of encryption algorithms and consider strong 
adaptive chosen-ciphertext security, i.e., a memory dump attack combined with 
chosen-ciphertext security.^ 

Definition 1 (Strong Chosen- Ciphertext Query). Let k be a security pa- 
rameter that generates matching encryption/decryption keys (e,d) for each user 
in the system. A strong chosen-ciphertext query is a process which, on input 1^ 
and e, obtains either 

— the plaintext (relatively to d) corresponding to a chosen ciphertext; or 

— an indication that the chosen ciphertext is invalid; or 

— non-erased internal states of the decryption oracle decrypting the submitted 

ciphertext, when a "dump” query is submitted. 

In the case the decryption oracle would like to prevent the exposure of an 
internal state, or part of it, it can take the operation 

atomic_action[- • •] . 

The term “otomzc” means that the action must be executed without external 
interruptions (e.g., memory core dump, system crash, user signal to kill the 
transaction, and so on [26]). After the execution being successfully completed, 
the internal variables used by atomic process are erased. 

Most cryptosystems proposed so far do not specify which steps in the decryp- 
tion process need to be protected from external probes. Such an implementation 
matter can be actually viewed as a cryptographic design matter: the designer of 
a cryptosystem should express the steps in the decryption process that should 
be protected by an “atomic action” . Of course, the best design is a cryptosystem 
wherein only a single operation (using the private key) is executed atomically, 
in the above sense. 

To fix the ideas, suppose that an adversary attacks the plain RSA algorithm, 
as originally described in [22]. Let n denote the RSA modulus and let (e, d) 
denote the pair of encryption/decryption keys. If the (plain) RSA decryption, 
m = mod n, is not performed atomically then an appropriate “dump” query 
may reveal the values of m, c, d and n as they are, during the course of the 
decryption, available somewhere in the memory. 

® Similarly, for digital signature schemes, stronger security notions can be defined such 
as, for example, existential unforgeability under adaptive chosen-message attacks with 
memory dump. 
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However, by using the operation atomic_action[m •<— mod n], which is 
made up of the following six unit operations 



atomic_action 



1 

2 

3 

4 

5 

6 



MEM[1] ^ n 
MEM [2] ^ c 
MEM [3] ^ d 

MEM[4] ^ MEM[2]“”f®' mod MEM[l] 
m ^ MEM [4] 

erase internal memory states MEM[l] , MEM[2] , 
MEM[3l and MEM[4l 



the oracle has only information on n and c (which are publicly known), and 
on TO from a “dump” query. In particular, remark that the value d (MEM[3]) is 
inaccessible since it is erased from the memory (cf. Step 6). 

In our model, oracle’s operations using private key are always executed atom- 
ically. This restriction is the weakest possible: allowing the adversary to have 
access to user’s private key (e.g., d in the above example) has the same effect as 
allowing the adversary to submit the target ciphertext itself to the decryption 
oracle. Note here that the power of a strong adaptive chosen-ciphertext attack 
deeply depends on the amount of (unit) operations contained inside a atomic 
action, so we can consider this amount of operations as one of the security eval- 
uation criteria of a given cryptosystem. 

Definition 2 (Strong [Static/ Adaptive] Chosen- Ciphertext Attack). A 

strong static chosen-ciphertext attack consists of the following scenario: 

1. On input a security parameter k, the key generation algorithm K. is run, 
generating a public key and a private key for the encryption algorithm £. 
The adversary of course obtains the public key, but the private key is kept 
secret. 

2. [Find stage] The adversary makes polynomially (in k) many strong chosen- 
ciphertext queries (as in Definition 1) to a decryption oracle. 

(The adversary is free to construct the ciphertexts in an arbitrary way — it 
is certainly not required to compute them using the encryption algorithm.) 

3. The adversary prepares two messages toq, toi and gives these to an encryp- 
tion oracle. The encryption oracle chooses b {0, 1} at random, encrypts 
mb, and gives the resulting “target ciphertext” d to the adversary. The ad- 
versary is free to choose toq and m\ in an arbitrary way, except that they 
must be of the same length. 

4 . [Guess stage] The adversary outputs b' € {0, 1}, representing its “guess” 
on b. 



In a strong adaptive chosen-ciphertext attack, the adversary has still access to 
the decryption oracle after having received the target ciphertext: a second series 
of polynomially (in k) many strong chosen-ciphertext queries may be run. The 
unique restriction is not to probe the decryption oracle with the target cipher- 
text d . 
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The success probability in the previous attack scenario is defined as 

Pr[b' = b] . 

Here, the probability is taken over coin tosses of the adversary, the key generation 
algorithm K. and the encryption algorithm £, and (too,toi) € where M is 
the domain of the encryption algorithm £. 

Definition 3 (Strong [Static/ Adaptive] Chosen-Ciphertext Security). 

An encryption scheme is secure if every strong (static/adaptive) chosen-cipher- 
text attack (as in Definition 2) succeeds with probability at most negligibly greater 
than 1/2. 



2.3 Real-World Applications 

When our strong adaptive chosen-ciphertext attack is mounted, the adversary 
may learn (i.e., “memory core-dump”) the entire internal state of the decryption 
oracle, excluding data that has been explicitly erased. 

It may be useful to illustrate the definition of security with a simple exam- 
ple. Consider the case of a security-enhanced electronic mail system where a 
public-key cryptosystem is used to encrypt messages passed among users. It is 
a common practice for an electronic mail user to include the original message 
s/he received into a reply to the message. This practice provides an avenue for 
chosen-ciphertext attacks, as an adversary can send a ciphertext to a target user 
and expect the user to send back the corresponding plaintext as part of the reply. 
For instance, a reply to a message may be as follows [28]. 

(original message) 

> 

> Hi , is Yum-Cha still on tonight ? 

> 



(reply to the message) 

Yes, it’s still on. I’ve already made the bookings. 



Now suppose one step further that, via computer viruses, the adversary can 
modify the target user’s electronic mail system to core-dump (i.e., automatically 
write the exact contents of the memory to a file) and send back the core-dump 
file in a stealthy way. This is a concrete example of our attack model. Another 
concrete example is when user’s computer is crashed suddenly, internal secret 
information may then remain unerased. 

We quote the following three articles as a basis of thinking about the relevance 
of security against strong adaptive chosen-ciphertext attack in the real life. 
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How your privacy is caught in the Net by Duncan Campbell [7]: 

“(. . . ) Hackers and government agencies are hard at work designing infor- 
mation stealing viruses. Six months ago, two of them popped up in the same 
week. “Caligula’^ was aimed at users who installed a privacy-protection sys- 
tem called PGP. Once it infected a computer, it looked for a file holding the 
secret keys to PGP. Then, automatically and silently, it transmitted the file 
to the hacker’s Internet site. Caligula could have taken any information it 
wanted. Another virus called “Picture’’’ was aimed at the America On-line 
(AOL) Internet service. Picture collected AOL users’ passwords and log-in 
data, and sent them to a web site in China. Three months later, “Melissa” 
appeared. It automatically read users’ address books, and used the informa- 
tion to mail itself to all their friends and contacts. 

According to Roger Thompson, director of anti-virus security consulting firm 
ICSA, information theft is a price to be paid for the advent of the Internet. 

(...) 

According to former ASIO deputy director Gerald Walsh, who proposed the 
new powers: The introduction of other commands, such as diversion, copy, 
send, (or) dump memory to a specified site, would greatly enhance criminal 
investigations. (. . . )”. 

Phone.com takes aim at WAP security hole in eWEEK [11]: 

“(...) In current WAP transmissions, data must use two security protocols 
— WTLS during the wireless part of the journey and SSL once the data 
hits the wires. There is a split second when the data must decrypt and 
re-encrypt to switch from one protocol to the other. A security flaw could 
occur if someone was able to crash the machine in the split second between 
decryption and re-encryption, causing a memory dump to the disk. (. . . )”. 

Memory reconstruction attack in RSA Official Guide to Gryptography [6]: 
“(...) Often, sensitive material is not stored on hard drives but does appear 
in a computer’s memory. For example, when the program you’re running 
allocates some of the computer’s memory, the OS tags that area of memory 
as unavailable, and no one else can use it or see it. When you’re finished with 
that area of memory, though, many operating systems and programs simply 
“free” it — marking it as available — without overwriting it. This means that 
anything you put into that memory area, even if you later “deleted” it, is still 
there. A memory reconstruction attack involves trying to examine all possible 
areas of memory. The attacker simply allocates the memory you just freed 
and sees what’s left there. 

A similar problem is related to what is called “virtual memory.” The mem- 
ory managers in many operating systems use the hard drive as virtual mem- 
ory, temporarily copying to the hard drive any data from memory that has 
been allocated but is momentarily not being used. When that information is 
needed again, the memory manager swaps the current virtual memory for the 
real memory. In August 1997, The New York Times published a report about 
an individual using simple tools to scan his hard drive. In the swap space, he 
found the password he used for a popular security application. (. . . )”. 
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3 On the Power of Strong Adaptive Chosen-Ciphertext 
Attacks 

For the last few years, many new schemes have been proposed with provable 
security against chosen-ciphertext attacks. Before 1994, only theoretical (i.e., 
not very practical) schemes were proposed. Then Bellare and Rogaway [4] came 
up with the random oracle model [3] and subsequently designed in [4] a generic 
padding, called OAEP (Optimal Asymmetric Encryption Padding), to transform 
a one-way (partially) trapdoor permutation into a chosen-ciphertext secure cryp- 
tosystem. Other generic paddings, all validated in the random oracle model, were 
later given by Fujisaki and Okamoto [12] (improved in [13]), by Pointcheval [20], 
and by Okamoto and Pointcheval [19]. The first practical cryptosystem with 
provable security in the standard model is due to Cramer and Shoup [8]. They 
present an extended ElGamal encryption provably secure under the decisional 
Diffie-Hellman problem. 

In this section, we demonstrate that most of “decrypt-then-validate”-type 
cryptosystems with provable security (including OAEP-RSA and most ElGamal 
variants) can be broken under our strong adaptive chosen-ciphertext attacks. 



3.1 OAEP-RSA 

We give here a brief overview of OAEP-RSA and refer the reader to [4] for 
details. The decryption phase of OAEP-RSA is divided into three parts: (i) RSA 
decryption, (ii) validation, and (iii) output. 

Let n = pq denote an RSA modulus, which is the product of two large 
primes p and q. Furthermore, let e and d, satisfying ed = 1 (mod lcm(p — 
l,g — 1)), respectively denote the public encryption exponent and the private 
decryption exponent. We assume a hash function H : {0, — >• {0, 1}*° 

and a “generator” function G : {0, 1}^“ — >■ {0, where km + ko + ki is the 

bit-length of n. The public parameters are {n, e,G,H} and the secret parameters 
are {d,p, q}. 

A fcm-bit plaintext message m is encrypted through OAEP-RSA as 
c = (sJlt)® mod n with s = © G(r) and t = r (B H{s) 

for a random r G {0, 1}^“. Given a ciphertext c, m is recovered as: 

(i) RSA decryption 

• sjjt = atomic_action[c‘* mod n]; 

• z = s (B G{t (B H{s)). 



(ii) 



I Validation 

If the last fci bits of z are not 0 then 

• Erase the internal data; and 

• Return “Invalid Ciphertext”. 



(iii) 



Output 



• Return the first km bits of t. 
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The attack. The problem with OAEP-RSA resides in that the validity test 
cannot be performed (i.e., the adversary cannot be detected) until after the RSA 
decryption is completed [25] . Thus an attacker can freely mount a strong adaptive 
chosen-ciphertext attack and extract the partial information from internal data 
in the decryption oracle’s memory. 

For example, consider the following game played by a strong chosen-ciphertext 
adversary. Let c = (sjjt)® mod n denote the target ciphertext. First, the adver- 
sary chooses a random r and forms the (invalid) ciphertext d = c ■ mod n. 
Then she submits c' to the decryption oracle. Just after the decryption oracle 
computes w' atomic_action[c'‘^ mod n] (note that w' = ■ r (mod n)) but 

before it detects and rejects the invalid ciphertext d , the adversary submits a 
“dump” query, and then obtains the internal data w' . From w' , she computes 
s\\t = ^ mod n and 2 = s © G{t © H{s)). Finally, the adversary recovers m as 
the first bits of z, so it is easy to find a correct guess by using the recovered 
m. 



3.2 Other Provably Secure Cryptosystems 

Similarly to the plain RSA cryptosystem, the plain ElGamal cryptosystem [10] 
is malleable. We recall the attack hereafter. 

Let G = (g) be the cyclic group generated by g. The public encryption key 
is X = g^ and the private decryption key is x. A message m, considered as an 
element of G, is encrypted as (ci, C 2 ) = {g^, X^ ■ m) for some random integer y. 
Given (ci,C 2 ), plaintext message m is recovered as m = • C 2 . 

If (ci, C 2 ) is the target ciphertext, then an adversary can compute the ciphertext 
{^'iX' 2 ) = ■ Cl, A” • C 2 ) for some random r. Next, by submitting (c(^,C 2 ) to 

the decryption oracle, she recovers m = m' = atomic_action[c^ ^] • C 2 from a 
“dump” query. 

Several variants of the basic ElGamal scheme were proposed in order to make 
it secure against chosen-ciphertexts attacks. We review some of them in the 
chronological order of their appearance and analyze their resistance under our 
strong chosen-ciphertext attacks. Here, for simplicity, the same notation G, H 
are used for several schemes, but functions G and H may have different domain 
and image from those used in OAEP-RSA of Section 3.1. 

Zheng and Seberry I [28] 

• encryption: (ci,C 2 ) = {g^,G{X^) © {m\\H{m))) 

• decryption: 1) m\\t = G(atomic_action[ci'^]) © C 2 

2) if t = H{m) then output m else output reject 

• attack: 1 ) set (c'^, = (ci, C 2 © r) for a random r 

2 ) recover m from (m'|| • • •) © r = to|| • • • 

Zheng and Seberry II [28] 

• encryption: (ci, C 2 , C 3 ) = ((/^, iLs(m), z © m) with z||s = G(A*^) 

• decryption: 1) z||s = G(atomic_action[ci’”]) 

2 ) m = z © C 3 

3) if C 2 = Hs{m) then output m else output reject 
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• attack: 1) set {c[,C2,c'^) = (ci, 02,03) with 02 ^ 02 

2) recover m = m' 

Zheng and Seberry III [28] 

• encryption: (01,02,03,04) = (mod #Q),z®m) 

with r = Xy+>^ and 0 = G(r) 

• decryption: 1) r = atomic_action[(ciC2)*] 

2 ) m = G{r) © 04 

3 ) if = d’’ • 02°^ then output m else output reject 

• attack: 1) set (c{, c'2, c'^, c'^) = (01,02,03,04) with 03 yf 03 

2) recover m = m' 

Tsiounis and Yung [27] 

• encryption: (ci, 02, 03, 04) = (g^, • m, g'", g • i 7 (g, ci, 02, 03) + /c) 

• decryption: 1) if g'^^ (.^v-H{g,ci,c2,c3)^^ then output reject 

2) output m = atomic_action[oi“^] • 02 

Cramer and Shoup [8] 

• encryption: (01,02,03,04) = (gl^g2^Yl'* • m,Y2® • 
with Xi = gi^, Y2 = gi^i • g2®^ and Y3 = gi*^i • g2^^ 

• decryption: 1 ) a = 77(01,02,03) 

2) V = atomic_action[ci“^'^“''^i“ • 

3 ) if 04 = 0 then output m = atomic_action[oi“^] • 03 
else output reject 

Fujisaki and Okamoto [12] 

• encryption: (01,02) = (to||s) © 

• decryption: 1) m|]s = atomic_action[ci^] © 02 

2) if Cl = then output m else output reject 

• attack: 1) set (c'l, c'2) = (oi, 02 © r) for a random r 

2) recover m from (m'|| • • •) © r = to|| • • • 

Fujisaki and Okamoto [13] 

• encryption: (01,02,03) = (^gH(s,m) ^ ^H{s,m) , g ^ 

• decryption: 1) s = atomic_action[oi““] • 02 

2) m = 

3 ) if 02 = • s then output m else output reject 

• attack: 1) set (c'l, C2, C3) = (g’’ • ci, Y’’ • 02, 03) for a random r 

2) recover m = m! 

Point cheval [20] 

• encryption: (ci, 02, 03) = (g^(™"®\ • yfc, (m||s) © G(/c)) 

• decryption: 1 ) m||s = G(atomic_action[oi“'^] • 02) © 03 

2) if Cl = then output m else output reject 

• attack: 1) set {c[, c'2, Cg) = (ci, 02, Cg © r) for a random r 

2) recover m from (m'|| • • •) © r = to|[ • • • 

Baek, Lee, and Kim [1] 

• encryption: (01,02) = (m||s) © G(Y-^("*H®^)) 

• decryption: 1 ) m|]s = G(atomic_action[ci“]) © 02 

2) if Cl = then output m else output reject 

• attack: 1) set (c'l, c^ = (ci, C2 © r) for a random r 

2) recover m from {m'\\ • • •) © r = to|| • • • 
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Schnorr and Jakobsson [23] 

• encryption: (ci, C2, C3, C4) = + m, Ci, C2), s + C3 • j/) 

• decryption: 1) if C3 yf Ci, C2) then output reject 

2) output m = C2 — G(atomic_action[ci^]) 

Okamoto and Pointcheval [19] 

• encryption: {ci, €2,03, 04) = , Xy ® H{R,m,ci, €2,03)) 

• decryption: 1) i? = atomic_action[ci®] © C2 

2) m = P^^^)(c3) 

3) if C4 = -ff (-R, TO, Cl, C2, C3) then output to 
else output reject 

• attack: 1) set (01,02,03,04) = (04,02,03,04) with 04 yf 04 

2 ) recover to = to' 



Table 1. Analysis of several ElGamal variants. 



ElGamal variant 


Type 


Attack 


Zheng and Seberry I, II, III [28] 


Decrypt-then- validate 


Yes 


Tsiounis and Yung [27] 


Validate-then-decrypt 


No 


Cramer and Shoup [8] 


Validate-then-decrypt 


No 


Fujisaki and Okamoto [12] 


Decrypt-then- validate 


Yes 


Fujisaki and Okamoto [13] 


Decrypt-then- validate 


Yes 


Pointcheval [20] 


Decrypt-then- validate 


Yes 


Baek, Lee, and Kim [1] 


Decrypt-then- validate 


Yes 


Schnorr and Jakobsson [23] 


Validate-then-decrypt 


No 


Okamoto and Pointcheval [19] 


Decrypt-then- validate 


Yes 



Table 1 summarizes the cryptographic characteristics of the previously described 
schemes. According to the table, the “decrypt-then-validate”-type schemes are 
all susceptible to our extended attacks. This is certainly the case when a compo- 
nent Ci in the ciphertext is especially dedicated to the validity test (e.g., as in [28, 
II and III] or [19]); in that case, it suffices to probe the decryption oracle with 
the target ciphertext where component Oj is replaced by an arbitrary component 

Oj yf Oi- 

Remark that the attacks we described are very powerful. Even if the whole 
decryption operation (excluding the validation checking) is performed through 
an atomic_action, our attacks are still successful. This, however, does not mean 
that it is impossible to construct a “decrypt-then- validate” -type scheme secure 
against strong adaptive chosen-ciphertext attacks. For example, we were not able 
to find an attack on the following modification of Back, Lee, and Kim scheme 
wherein the © operator is replaced by a pair of symmetric encryption/decryption 
algorithms ,'D^^"') like DES. 
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• encryption: (ci,C2) = 

• decryption: 1) m||s = atomic_action[2?Q^^j,j(c2)] 

2) if Cl = then output m else output reject 

(Note that we did not prove the security of the scheme.) 

This modified scheme is nevertheless less satisfactory than a scheme wherein 
only the private-key operations in the decryption process are run “atomically” 
such as in m||s = action[ci®])(‘^2)- Unfortunately, this latter scheme is 

insecure. Submitting (c'l, = (ci, c'2) with c'2 yf C2 to the decryption oracle, the 

adversary obtains the value of R := atomic_action[ci“] from a “dump” query 
and therefore recovers m as the most significant |m|-bits of 22^^^ (02). 

The ElGamal variants by Tsiounis and Yung [27] and by Schnorr and Jakob- 
sson [23] are actually signed encryption schemes. The security proofs can be 
extended to the “strong adaptive chosen-ciphertext attack” scenario since the 
validity is checked prior to and separately from the decryption operation itself. 

For the scheme by Cramer and Shoup [8], things are slightly different. Some 
secret data are involved in the validity test. As a consequence, not only the 
operations using the private decryption-key (i.e., z) but also the operations using 
the private “validation-keys” (i.e., (xi,X2) and (j/i, j/2)) need to be “wrapped up” 
in an atomic_action. Thus, from our evaluation criteria, we can say that this 
latter scheme needs more protection than the schemes of Tsiounis and Yung, 
and of Schnorr and Jakobsson. 

4 Conclusion 

Many security problems are often viewed as “implementation errors” . We believe 
that those could be more fruitfully viewed as cryptographic design errors. In 
this paper we presented a new security model for encryption algorithms and 
analyzed the security of several algorithms provably secure against (standard) 
adaptive chosen-ciphertext attacks. Amongst other things, we showed that, in 
view of higher-level application programs, “validate-then-decrypt”-type schemes 
generally better behave facing our extended attacks. 
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Abstract. We generalize to any base r > 2 the Majority-Logic-Deco- 
dification Algorithms already considered for r = 2 by Ghin-Long Chen, 
Robert T. Chien and Chao-Kai Liu [2]. The codes considered are gener- 
ated by 4>n{r) where 4>n{x) is the nth-cyclotomic polynomial associated 
to the polynomial a;" — !. Hong Decodification Algorithm [7] is also appli- 
cable to these codes, but achieves quite higher computational complexity. 



1 Introduction 

Arithmetic-Modular Godes were first proposed by Diamond [4]. These codes 
are useful on their own for error control in digital arithmetic and recently such 
procedures for eliminating errors in hardware arithmetic have been revealed of 
great cryptographic importance (See [1]). The arithmetic operations considered 
(in particular addition) are carried out with numbers represented the number 
system in base r (r G N, r > 2) . One single error in addition can cause many 
faulty digits in the final representation information arising from the carrying in 
the operation. 

An Arithmetic-Modular AN-code of length n, with modulus m and generator 
A is just the set C{A,B) = {AN \ N G N,0 < N < B}. where m = A - B. If 
TO = r" — 1 (resp. to = r” -|- 1) then C(A, B) is a subgroup of Zj{m) and it is 
called a Gyclic (resp. Negacyclic) AN-Gode [3]. 

If an error pattern E occurs then, R = AN + E is the received information 
where AN stands for the correct codeword. Firstly we might find a suitable 

* All three authors are supported by Junta de Castilla y Leon project “Construccio- 
nes criptograficas basadas en codigos correctores” . Second one is also supported by 
Dgicyt PB97-0471. 
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distance function since Hamming metric used in the Information Transmission 
Codes cannot be applied in the theory of arithmetic-modular codes [3,5,8,10]. 

For cc G Z we will denote the modular weight of x by Wm(x) and it is defined 
as the minimal number of nonzero digits in any representation x = 
mod m where the integers Ci are such that \ci\ < r for all i, r is the chosen base 
and Ci = 0 if i > to for a fixed index zq (see [5,11,10]). 

The minimum distance of an arithmetic-modular AN-code in base r is defined 
as (see [10,11]): 

d=mm{d{x,y) = w^ix - y) \x,y& C{A,B),x^ y} 

= min{wm(a:) | x G C{A, B),x ^ 0} 

As usual, a code where the distance between two any different words is at least 
2t-|-l is called a t-error correcting code. The Correcting Capacity is the maximum 
t for which d > 2t + 1, that is t = [^^]. Thus a t-error-correcting AN- code 
with minimum distance d detects every error pattern E with Wm{E) < d — 1 
and corrects it for Wm{E) < t (see [10]). 

In order to compute the minimum distance we need the definition of Non 
Adjacent Form (NAF) and Cyclic Nonadjacent Form (CNAF), as well as the 
study of the exceptional cases (x G Z is exceptional if x = 0 mod m but (r -|- 
l)x ^ 0 mod m). These algorithms and the definition and properties of the 
syndrome of an integer can be found in [5] and [8]. 

In the theory of Arithmetic-Modular AN-Codes, the main problem still un- 
solved is finding good decoding algorithms. In all the AN-codes considered in [2] 
it is possible to accomplish the decodification using majority logic. This method 
was already used by Chin-Long Chen, Robert T. Chien and Chao-Kai Liu in 
the binary case r = 2. In this paper we will generalize this problem to any base 
r > 2 for one, two and L steps [6]. 

2 Majority-Logic-Decodable Cyclic Arithmetic-Modular 
AN-Codes (1 Step) 

We consider the cyclic Arithmetic-Modular AN-Code C(A, B) generated by A = 
and having information rank B = — 1, where the code length n verifies 

that n = ni-n 2 - We suppose ni > 3 and ri 2 > 2. The minimum modular distance 
of C(A, B) is d = ni, in fact: 

Wm{A) = wm{l + + • • • + = m 

Let A^ G N be such that 1 < A^ < R = — 1 and its CNAF N = cq -I- cir -|- 

• ■ • + c„ 2 _ir” 2 “i mod m. If we denote N = cqCi ... Cn^-i, then the CNAF of 
AA^ is: 

AN =cq -I- cix -I 1- + cor”^ -I- H h 



therefore: 
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Cl . . . Cyi2 — 1 CQ Cl . . . Cyi2 — 1 CQ Cl . . . Cyi2 — 1 /^\ 

1*U2 2*rz2 ni*ri2 

where a:*?/ means block number a: with y elements. Accordingly C{A, B) detects 
all error pattern E with modular weight Wm{E) < rii — 1 and corrects all error 
pattern with modular weight Wm{E) <t= [ ”^~^ ] . 

We recall R is the received information, i.e., R = AN + E with Wm{E) < t 
and AN the codeword that decodifies R then E and N are the values we want 
to calculate. The proof of next lemma becomes straightforward. 

Lemma 1. Consider a code — 1) and let R = AN+E. Then N = 0 

if and only if Wm{R) < t = . 

If N is an integer, 0 < N < B and we consider the usual expression in base r, 
0 < Oi < r, t = 0, 1, 2, . . . , ri 2 — 1 , i.e., N = oq ai, . . . a„ 2 -i- (Note that digits 
Qi are used in the standard r-ary expression and digits Ci stand for the CNAF). 
Then we have: 



_ aoai ... a„2-i oo oi • ■ • Qo ai • ■ • 

1*U2 2*U2 ni*ri2 

i.e., n\ identical blocks, each of them with ri 2 elements. 

Lemma 2. A single error E in any position of the expression in (4) of any 
nonzero word ofC{A,B) can modify at most U 2 consecutive digits of that word. 

Proof. Assuming that 0 < — 1, there is, at least, a nonzero digit Oi and, 

at least a digit aj such that aj < r — 1. This allows the definition of h and j as 
follows: 



h = min {h' G {0, 1, . . . , ri 2 — 1} | < r — 1} 

j = min {/ G {0, 1, . . . , U 2 - 1} I ap > 0} 



Since Wm{E) = 1 we can write E = mod m where 0 < \bi\ < r and 

the digit k shows us the block where the error E is affecting. 

Oq Qi . . . 0^2 — 1 ao Ul . . . (Oi -h hi) . . . 0^2 — 1 C-O a\ . . . 0^2 — 1 /^\ 

R = . . . . . . (b) 

l*n2 k-kU2 ni-kU2 

According to the sign of the digit hi we are able to distinguish two cases: 

1. Case 1: > 0 

If Qi + hi < r then (6) is the usual expression of R in base r and the error E 
will modify the word AN only in one digit, the one in the position ku 2 + i. 
But if Oi + bi > r, the block k in (6) will be: 

oo ai ... (ui + h-r) (ui+i + 1) ... a„ 2 _i 



fc * ri2 



( 7 ) 
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so that if ai+i -I- 1 < r it can be considered as the usual expression of R in 
base r and will not modify the following digits, otherwise (i.e. if Ui+i -I- 1 = r) 
the block k will be 



apai ... {at + bj- r)0 {aj +2 + 1) ■ ■ ■ Qna-i 
k-kU2 



(8) 



and so on. 

This process of “carry propagation” will finish at the first digit aj such that 
ttj < r — 1; Strictly speaking, at the digit h either belonging to the block k 
(if i < h) or to the block k + 1 (if i > h). Therefore it can never corrupt 
more than ri 2 consecutive digits from the position kn 2 + i- 
2. Case 2: 6^ < 0 

If Gi + bi >0 the usual expression of R in base r is (6). (Notice that we have 
Gi + bi < r for bi < 0 ) The error E will have modified only one digit in AN. 
But if Gi + bi <0 then the block A: of ( 6 ) will be: 



gp oi ... (oj + bt + r) (oj+i - 1) ... Gn^-i 

k-kU2 



(9) 



So, if Gi+i — 1 > 0 it can never corrupt subsequent digits but if g^+i — 1 < 0 
the block k will be: 



Op ai ... (g^ + bj + r) (a,+i - 1 -h r) (a ^+2 ~ 1) ■ ■ ■ an2-i 

k-kU2 



(10) 



and so on. 

At this stage, the process of “carry propagation” will be finished when the 
first digit gj > 0 is found. This process will finish in the digit gj of the block 
k { if i < j ) or of the block fc -I- 1 ( if i > j ) at most. Consequently it can 
never modify more than U 2 consecutive digits of the usual expression of the 
codeword AN. Having in mind that block A: -I- 1 is the same that block 1 
when k = ni the proof of the lemma is complete. □ 

Lemma above provides the tools necessary for proving the theorem and the 
decodification algorithm below. It is only necessary to take into account that 
ni-t> 

Theorem 1. Let C{A,B) be a cyclic arithmetic-modular AN-code generated by 
A = where the length n of the code verifies that n = n\n 2 and let E G h, 

Wm{E) < t = where R = AN E is the received information. We 

will consider the digits of positions k, U 2 + A:, 2ri2 -I- A:, ... , (ni — l)n 2 k in the 
usual r-ary expansion of R. Then at least n\ — t of those digits are equal, and 
they correspond to those in the same position as in the r-ary expansion of the 
codeword AN. This stands for A: = 0, 1, 2, . . . , n 2 — 1. 

The proof follows directly from lemma above. This result allow us to state the 
following decoding algorithm: 
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Algorithm 1 (Majority- Logic Decoding (One step)) 



• If Wm{R) < t then Return N = 0 and R = E fi 

• If Wjn{R) > t then do 
For fc = 0, . . . , ri 2 — 1 do 

Consider the positions k, U 2 + k, 2ri2 + k, . . . , (ni — l)ri 2 + k 




1 *ri2 







2 * U2 



* * . . . * 







Til ir U2 



Take as the tk “true value” of Qk the most repeated value among the values 
in the multiset {di, . . . , d^ } ■ 

od 

return N = toti ... fi 

Example 1. r = 3, ni = 5, ri2 = 4, n = 20, A = = 43.584.805 

B = 3^^ - 1, m = AB = 320 - 1 = 3.486.784.401, d = m = 5 and t = 2. 

Let R = 3.094.403.084 be the received information with Wm{R) = 5 > t. Its 

3-ary expression is: 

22100022222122122212 
“ 1*4 2*4 3*4 4*4 5*4 
I decoding algorithm 
N = 2212 

Therefore N = 71 a.nd E = R - AN = -118.071 = 3^ - 2 • 3^°. Note that 
w^(33 - 2 • 31°) = 2 < t. 

3 Majority-Logic-Decodable Cyclic Arithmetic-Modular 
AN-Codes (L Steps) 

Let n be a positive integer such that n = Yli=iPT^ where the distinct primes Pi 
satisfy the condition pi > 2*“^(pi — 1) -|- 1, for t = 2, 3, . . . , L and pi > 2 in order 
to get Wm{E) > 0. It is a well known fact [9] that the polynomial a;" — 1 can 
be expressed as the product of cyclotomic polynomials over the rational field, 
i.e., a;” — 1 = Hdin 4>d{x). In particular, we can write r” — 1 = Odin 4’d{'r)- Note 
also that although 4>d{x) is irreducible over Q[x], <))d(r) might not be a prime 
number. One can consider the cyclic arithmetic-modular AN-code C{A,B) with 
generator A = 4>n{r) and rank of information B = = Hdln d/n 4’d{r). 

We will show that the AN-code corrects all error patterns E with Wm{E) < 
^i^.This will lead us to the conclusion that d = pi, and to a Majority-Logic 
Decodification Algorithm of such a code in L steps. 
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Let R = AN -I- i? be the received information where Wm{E) < ^ . Multi- 
plying both sides of the equation by Y\^=i we obtain 



L-l 



L -1 



i? n - 1) + ^) n 



We denote Rl ='■ 
the integers: 



=A 

(r- - 1) 



L-l L-l 

- l) iV -h l; 



n - 1) 



( 11 ) 






where mod m = r” — 1, and we define 

nti {rE - l) 



r" - 1 ^ B 

Al = ^: , Bl = tpl Kl = —, Ml = 

rPL — \ dl 

It is not difficult to show that: 

L-l 

Al=AKl, Y[ - l) = KlMl, (r^ - l) \Rl 

2=1 



Replacing in equation (11) we obtain: 



L-l 



L-l 



R 



P (rn - l) =AKlMlN + S P (r^ - l) 

■ ^ 2=1 

L-l 

=AlMlN + Li P (r^ - l) 



(12) 



2=1 



therefore if M =: M^N and =: 



EY[ti{rE-l) 



, it follows that 



Rl = \Al\M\bl + El\j 



(13) 



Lemma 3. Wm{EL) < 

Proof. 

WmiEr) =Wm (l; (rw - . . (r”r-i - l)) 

<2Wm {e (r^ — l) . . . — l)) 

<2^Wm (^E (r^ — l) . . . ^r^r-1 — l)) < • ■ • 
<2^-^Wr,{E) < 

(2^-Upi - 1) + 1) - 1 ^ PL-1 
2 - 2 

Note that the condition pr> 2^~^{pi — 1) + 1 has been used. □ 
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In addition, we have Rl = \Al\M\bl + EL\m where Wmi,EL) < and 

dminC'(^L, ^l) = PL- Using the algorithm 1 in previous section we will be able 
to find Nb, where 0 < < B and such that Rl = + E'j^ and 



E'l = E 



1^ — i. 

n H 



From equation above it follows that there will be an integer bL-i such that: 



E'l=E 



L,— I 

[rfl - l) + bL-i (r” - 1) 



rPL-i _ 1 



L, — Z 



r” - 1 

L n 

-l-PL-1 _ 1 



If we define 



- I 

Al-1 = , Bl-i = — I, Rl-1 = 

rPL-i - 1 



rPL-i - 1 1 



Er, = E 



n (" - 1) 



reducing (15) mod m we obtain 



Rl- 1 = \AL-l\bL-l\m + El-i\ 



Bearing in mind that pl-i > — 1) + I and following the previous 

procedures we can apply the algorithm 1 by using the AN-code C{Al-i, Bl-i) 
whose minimum distance is d = pl-i- By a recurrent process we will obtain in 
the L — 1 step the following relations: 

r” - 1 ^ 

i ?2 = A 2 N 2 + E '2 where A 2 = , B 2 = rP 2 — 1, 

r P2 — 1 

E '2 = E{r^ — 1) mod m and (rFT — l)|if 2 



Therefore — = E + b\ ^ where b\ G Z. If we let Ai = Ul ^ , Bi = 

rPi —1 rPi —1 rPi —1 

— 1, i?i = — and E\ = \E\m', then R\ = |Ai|6i|m + Ei\m with 

m 

Wm{Ei) = Wm{E) < 

Thus if we apply the algorithm 1 on the section above to i?i using the AN- 
code C{Ai, Bi), whose minimum distance is pi , there will be A^i G Z, 0 < A^i < 
B and such that Ri = AiA^i -|- where E'^ = E mod m, and this is the 
error that had been made. Hence it follows this L-steps algorithm: 
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Algorithm 2 (Majority- Logic Decoding Algorithm(L steps)) 



• Input R, 

• Ifrc^(i?) < ^ 

• Ifrc^(i?) > ^ 

Compute Rl = 



then Return N = 0 and R = E fi 
then do 



mti - l) 



For s = L, . . . , 1 stepsize=-l do 

Make majority decision (algorithm 1) on Rg in C{Ag,Bs) to get Rg = 
AgNg + E( where 0 < Ng < Bg 



Rs-i — 



Ei 



— 1) 



od 

return E = E[ mod m and N = fi 



Corollary 1. The minimum distance of the Cyclic AN-code generated by A = 
(fnir) is d = Pi, where the length n = Y[f=iPi* the distinct primes pi verify 
the condition pi > — 1) -|- 1, for i = 2, . . . , L and pi > 2. 

Proof. We have shown that C{A,B) may correct all error pattern E with 
Wm{E) < From this fact we obtain that d > 2 -I- 1 = Pi (see 

[10]). Now we consider the integer N = — — . It follows that Wm{AN) = 

(rPl -1)^ 

/ n_, \ n_, jn_ ^ (pi~l)ra 

Wm '"jT. = Pi because the CNAF of =i-|- 7 -Pi-|- 7 -Pi-| \-r pi 

\rPi-l/ rPi-l 

has pi nonzero digits. Thus d <p\ and hence d = pi.\A 



4 Particular Case: L = 2 



This algorithm is also true if n = Y\f=i ^ decomposition of n in relatively 
prime integers with the additional condition Ui > 2*“^(ni — 1)-|-1 for i = 2, . . . , L 
and ni > 1. If L = 2, n = niU 2 where ni,U 2 € N, gcd(ni, U 2 ) = 1 and U 2 > 
2ui + I. The generator of the arithmetic- modular AN-code C(A,B) will be 

A — (p"-l)(p-l) 

^ (r"i-l)(r"2-l) • 

Example 2. r = 4, ni = 3, ri 2 = 8, n = 24, 



(424 _ l)(4_ 1) 

(43 - 1) (48 - 1) 



204.525.373, 



B = 



(43 _ i )(48 _ 
4-1 



1.376.235, 



TO = AS = 4^4 - 1 = 281.474.976.710.655, d = m = 3 and t = 1. 



Let R = 4.704.075.387 be the received information with Wm{R) = 11 > t. 
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• Step 1 Compute R 2 = |i?(48 - 1)U = 26.806.603.776.390 
424-1 

A 2 = — - = 4.467.856.773.185, B 2 = 4^ - 1 = 63. 

4CS _ 1 

The expression of i ?2 in base 4 is: 

_210 210 020 210 212 1 10 210 210 
^ “ 173 2^3 3^3 4 ^ 5^3 6^3 7 ^ 8 ^ 

4 , decoding algorithm 1 
N 2 = 210 



A^2 = (2 + 1 • 4 + 0 • 42) = 6 and = i?2 - ^2iV2 = -536.862.720 



Step 2 Compute i?i = 






48-1 



= 281.474.976.702.463. 



7li = 



424-1 

48 _ 1 



= 4.295.032.833, 



= 4« - 11 = 65.535 



Therefore, expressing in base 4: 

333333133333333333333333 

^ “ TTs 27s sTs 

4 , decoding algorithm 1 
Ni = 33333333 



fVi = 65.535 and E{ = Ri — A\Ni = —8.192 = —2 • 4®, hence our algorithm 
results: 



D ^ 

N = T — =23 E = —2 • 4® mod m 

A 
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Abstract. Standard rnnlength-limiting codes - nonlinear codes defined 
by trellises - have the disadvantage that they disconnect the onter error- 
correcting code from the bit-by-bit likelihoods that come ont of the chan- 
nel. I present two methods for creating transmissions that, with proba- 
bility extremely close to 1, both are runlength-limited and are codewords 
of an outer linear error-correcting code (or are within a very small Ham- 
ming distance of a codeword). The cost of these runlength-limiting meth- 
ods, in terms of loss of rate, is significantly smaller than that of standard 
runlength-limiting codes. The methods can be nsed with any linear onter 
code; low-density parity-check codes are discussed as an example. 

The cost of the method, in terms of additional redundancy, is very small: 
a reduction in rate of less than 1% is sufficient for a code with blocklength 
4376 bits and maximum runlength 14. 



This paper concerns noisy binary channels that are also constrained chan- 
nels, having maximum runlength limits: the maximum number of consecutive Is 
and/or Os is constrained to be r. The methods discussed can also be applied to 
channels for which certain other long sequences are forbidden, but they are not 
applicable to channels with minimum runlength constraints such as maximum 
transition-run constraints. 

I have in mind maximum runlengths such as r = 7, 15, or 21. Such constraints 
have a very small effect on the capacity of the channel. (The capacity of a 
noiseless binary channel with maximum runlength r is about 1 — 2“’’.) 

There are two simple ways to enforce runlength constraints. The first is to use 
a nonlinear code to map, say, 15 data bits to 16 transmitted bits [8]. The second is 
to use a linear code that is guaranteed to enforce the runlength constraints. The 
disadvantage of the first method is that it separates the outer error-correcting 
code from the channel: soft likelihood information may be available at the chan- 
nel output, but once this information has passed through the inner decoder, its 
utility is degraded. The loss of bit-by-bit likelihood information can decrease the 
performance of a code by about 2 dB [3] . The second method may be feasible, 
especially if low-density parity-check codes are used, since they are built out of 
simple parity constraints, but it only gives a runlength limit r smaller than 16 if 
the outer code’s rate is smaller than the rates that are conventionally required 
for magnetic recording (0.9 or so) [7]. 
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I now present two simple ideas for getting the best of both worlds. The 
methods presented involve only a small loss in communication rate, and they 
are compatible with the use of linear error-correcting codes. The methods do not 
give an absolute guarantee of reliable runlength-limited communication; rather, 
as in a proof of Shannon’s noisy channel coding theorem, we will be able to 
make a statement like ‘with probability 1— 10“^°, this method will communicate 
reliably and satisfy the r = 15 runlength constraint’. This philosophy marries 
nicely with modern developments in magnetic recording, such as (a) the use 
of low-density parity-check codes, which come without cast-iron guarantees but 
work very well empirically [7]; and (b) the idea of digital fountain codes [1], 
which can store a large file on disc by writing thousands of packets on the disc, 
each packet being a random function of the original file, and the original file 
being recoverable from (almost) any sufficiently large subset of the packets - 
in which case occasional packet loss is unimportant. The ideas presented here 
are similar to, but different from, those presented by Immink [4,5,6], Deng and 
Herro [2], and Markarian et al. [9]. 

1 Linear Method 

The first idea is a method for producing a codeword of the linear outer code 
that, with high probability, satisfies the runlength-limiting constraints. 

We assume that a good outer {N, K) linear error-correcting code has been 
chosen, and that it is a systematic code. We divide the K source bits into 
user bits and K,- > log 2 M additional special source bits that we will set so as 
to satisfy the runlength constraints. The code has M = N — K parity bits. We 
choose the special source bits such that all M of the parity bits are influenced by 
at least one of the special bits. When transmitting the codeword we order the 
bits as shown below, such that the + M special bits and parity bits appear 
uniformly throughout the block. We call these K,- + M bits the pad bits. 



K 




P 



We modify the code by adding an offset o to all codewords. The random vector 
o is known to the sender and receiver and satisfies the runlength constraints. 
As we will see later, it may be useful for o to change from block to block; for 
example, in magnetic recording, it might depend on a seed derived from the 
sector number. 
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The method we will describe is appropriate for preventing runs of length 
greater than r for any r greater than or equal to p, the distance between pad 
bits. Two examples of the code parameters are given in table 1. i? is the overall 
code rate from the user’s perspective. We intend the number of special source bits 



Table 1. Some feasible parameter settings. 



rVu 

K, 

M 

N 

R = K^/N 
p = N/{M + K,) 



4096 4096 
20 24 

296 432 
4412 4552 
0.928 0.9 
14 10 



itlr to be very small compared with the blocklength. For comparison, a standard 
rate 15/16 runlength-limiting inner code would correspond to ~ 256 bits. 
If instead we use iFr = 24 then we are using one tenth the number of bits to 
achieve the runlength constraint. 

The idea of the linear runlength-limiting method is that, once the user bits 
have been set, there are very likely to be codewords that satisfy the runlength 
constraints among the 2^’' codewords corresponding to the 2^' possible settings 
of the special source bits. 

Encoding Method 

We write the user data into the iFu bits and note which, if any, of the + M 
pad bits are forced by the runlength constraints to take on a particular state. 
[If the maximum runlength, r, is greater than the spacing between pad bits, p, 
then there may be cases where we are free to choose which of two adjacent pad 
bits to force. We neglect this freedom in the following calculations, noting that 
this means the probability of error will be overestimated.] 

For a given user block, the probability, averaging over offset vectors o, that 
a particular pad bit is forced to take state 0 is the probability that it lies in or 
adjacent to a run of r Is, which is approximately 

/3 = r2-F (1) 

The probability that a particular pad bit is forced to take state 1 is also [3. The 
expected number of forced pad bits in a block is thus 2f3{K,; + M). Table 2 shows 
that, for r > 14, it will be rare that more than one or two pad bits are forced. 

Having identified the forced bits, we use linear algebra to find a setting of the 
iFr special bits such that the corresponding codeword satisfies the forced bits, 
or, in the rare cases where no such codeword exists, to find a codeword that 
violates the smallest number of them. 
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Table 2. Examples of the expected number of forced pad bits, 2/3(M + K^). 



r 


14 


16 


20 


13 


8 X lO""* 


2 X 10~‘‘ 


2 X 10 


Kr + M 


316 


316 


316 


2/3(M -h K,) 


0.5 


0.15 


0.01 



1.1 Probability of Failure 

The probability that this scheme will fail to find a satisfactory codeword depends 
on the details of the outer code. We first make an estimate for the case of a 
low-density parity-check code; later, we will confirm this estimate by numerical 
experiments on actual codes. 

Let the columns of the parity-check matrix corresponding to the + M 
pad bits form a submatrix F. In the case of a regular low-density parity-check 
code, this binary matrix will be sparse, having column weight 4, say, or about 8 
in the case of a code originally defined over GF(16). If the code is an irregular 
low-density parity-check code, these columns might be chosen to be the higher 
weight columns; we will see that this would reduce the probability of error. 

Consider one row of F of weight w. If the w corresponding pad bits are 
all constrained, then there is a probability of 1/2 that the parity constraint 
corresponding to this row will be violated. In this situation, we can make a 
codeword that violates one of the w runlength constraints and satisfies the others. 
The probability of this event happening is (2/3)™. For every row of F, indeed for 
every non-zero codeword of the dual code corresponding to F, there is a similar 
event to worry about. The expected number of runlength constraints still violated 
by the best available codeword is thus roughly 

Y,lMw){2f3r, ( 2 ) 

W 

where Af(w) is the weight enumerator function of the code whose generator 
matrix is F. For small /3, this expectation is dominated by the low- weight words 
of F, so, if there are M words of lowest weight iCmin, the expected number of 
violations is roughly 

^M(2/3)™“‘“ (3) 

Table 3 shows this expected number for rumin = 4 and 8. 

For example, assuming Wmin = 8 (which requires a matrix F whose columns 
have weight 8 or greater) , for a maximum runlength r of 14 or more, we can get 
the probability of failure of this method below 10“^°. 

What the above analysis has not pinned down is the relationship between 
the column weight of F and iCmin- We now address this issue, assuming F is 
a low-density parity check matrix. If F has a row of weight w, then the dual 
code has a word of weight w. Any linear combination of rows also gives a dual 
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Table 3. The expected number of violations for Wmin = 4 and 8. 



1 

2 

1 

2 



r 14 16 20 

M 296 296 296 

P 8 X 10“"^ 2 X 10“"* 2 X 10"® 
M(2/3)^ 1 X 10“® 8 X 10”^^ 3 x 10"^® 
M(2/3)® 1 X 10"^° 5 X 10"^® 7 x 



codeword. Is it likely that the dual code has words of lower weight than the 
already sparse rows that make up the parity check matrix? It would be nice to 
know it is not likely, because then we could approximate the expected number 
of violations (2) by 

\Y.g{w){2^r, (4) 

W 

where g{w) is the number of rows of F that have weight w. However, as F 
becomes close to square, it becomes certain that linear combinations of those 
low-weight rows will be able to make even lower weight dual words. The approx- 
imation (4) would then be a severe underestimate. 

We now test these ideas empirically. 



1.2 Explicit Calculation of the Probability of Conflict 



I took a regular low-density parity-check code with blocklength N = 4376 bits 
and M = 282 (the true number of independent rows in the parity check ma- 
trix was 281). The column weight was j = 4. In four experiments I allocated 
Kj: = 11,21,31,41 of the source bits to be special bits and found experimen- 
tally the conditional probability, given w, that a randomly selected set of w 
pad bits constrained by the runlength constraints would conflict with the code 
constraints. 

[Method: given w randomly chosen pad bits, Gaussian elimination was at- 
tempted to put the generator matrix into systematic form with respect to the 
chosen bits. This was repeated for millions of choices of the pad bits, and the 
probability of failure of Gaussian elimination was estimated from the results. 
The actual probability of failure will be smaller than this quantity by a fac- 
tor between 1 and 2, because it is possible, even though the pad bits are not 
independent, that the runlength constraints will be compatible with the code 
constraints.] 

Under a union-type approximation (like (4)) that only counts the dual code- 
words that are rows of F, we would predict this conditional probability to be 



P(conflict|rc) ~ E g{w') 



/N-w'\ 
\w—w' ) 



N- 



( 5 ) 
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The empirically-determined conditional probability of failure, as a function of 
the weight w of the constraint, is shown in figure 1, along with the approximation 
(5), shown by the lines without datapoints. 




Fig. 1. Probability of failure of the linear runlength-limiting method as a function of 
number of constrained pad bits, w. The four curves with points and error bars show 
empirical results for = 11, 21, 31, and 41. The four lines show the approximation 
(5). All systems derived from the same regular low-density parity-check code with 
M = 281 constraints and column weight j = 4. The runlength limits for these four 
cases are r = N /{Ky.-\-M) = 15, 14.5, 14, and 13.6. 



It can be seen that for = 11 and 21, the approximation is an underes- 
timate, but for = 31 and 41, it gives a snug fit in the important area (z.e., 
low w). From the empirical results we can also deduce the probability of failure, 
which is 



P(conflict) = P(w)P(conflict|t(;) (6) 

W 

(2/3)“(l-2/3)^-“P(conflict|w). (7) 

Plugging in /3 = 8 X 10“^ (corresponding to constrained length r = 14), we find 
that for ATr = 21, 31, and 41, the probability of a conflict is about 10“^®. We 
will discuss how to cope with these rare failures below. 
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1.3 Further Experiments 

I also explored the dependence on column weight by making three codes with 
identical parameters N, M, K^, but different column weights: j = 3, j = 4, and 
j ~ M/2 (a random code). Figure 2 shows that for j = 4, the failure probability, 
at small w, is quite close to that of a random code. 




Fig. 2. Probability of failure of the linear runlength- limiting method for various column 
weights. The code parameters were Kr — 25, M = 220. 



1.4 How to Make the Probability of Failure Even Smaller 

We showed above that our runlength-limiting method can have probability of 
failure about 10“^®. What if this probability of failure is too large? And what 
should be done in the event of a failure? I can suggest two simple options. First, 
in discdrive applications, if the offset vector o is a random function of the sector 
number where the block is being written, we could have an emergency strategy: 
when the optimally encoded block has runlength violations, leave a pointer in the 
current sector and write the file in another sector, where the coset o is different. 
This strategy would incur an overhead cost at write-time on the rare occasions 
where the writer has to rearrange the blocks on the disc. 

The second option is even simpler, as described in the following section. 
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2 Nonlinear Method 



If the linear method fails to satisfy all the runlength constraints, a really dumb 
option is to modify the corresponding pad bits so that the runlength constraints 
are satisfied, and transmit the modified word, which will no longer be a codeword 
of the outer error-correcting code. As long as the number of flipped bits is not 
too great, the decoder of the outer code will be able to correct these errors. The 
average probability of a bit’s being flipped is raised by a very small amount 
compared with a typical noise level of The probability distribution of 

the number of flipped bits depends on the details of the code, but for a low- 
density parity-check code whose graph has good girth properties, we’d expect 
the probability of t flips to scale roughly as 

PoPi"\ (8) 



where po = and p\ = and no worse than 



Kr + M 
t 



Pop{ \ 



(9) 



which is roughly 

For runiin = 4, K„ + M = 316, and t = 6, for example, the probability of t 
errors would be roughly as shown in table 4. Thus as long as the outer code is 



Table 4. 



r 14 16 20 

(^■•+,m)* (2/3)'^x 3 2 X 4 X 5 x 10“’^'^ 



capable of correcting substantially more than 6 errors, the probability of failed 
transmission using this scheme is very low indeed. 



2.1 Use of the Nonlinear Method Alone 

For some applications, the dumb nonlinear scheme by itself might suffice. At 
least in the case of a low-density parity-check code, it is simple to modify the 
decoder to take into account the fact that the pad bits are slightly less reliable 
than the user bits. [We could even include in the belief propagation decoding 
algorithm the knowledge that the pad bit is least reliable when it sits in the 
middle of run.] 

Let the pad bits be the parity bits, i.e., let Aj. = 0, use a random offset 
vector o, and flip whichever pad bits need to be flipped to satisfy the runlength 
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constraints. The probability of a pad bit’s being flipped is j3, which was given 
in table 2. If the ambient noise level is a bit-flip probability of 10“^, then for 
runlength constraints r greater than or equal to 16, the increase in noise level 
for the pad bits (from 10“^ to 10“^ -I- 0) is small enough that the average effect 
on performance will be small. For a t-error-correcting code, this dumb method 
would suffer an absolutely uncorrectable error (z. e., one that cannot be corrected 
at any noise level) with probability about -I- 1)!. For r = 16 and 

M = 316, this probability is shown in table 5. 



Table 5. Probability of an absolutely uncorrectable error for the nonlinear method 
and a t-error-correcting code with r = 16 and M = 316. 



t 

jVf (*+l) /3(*+l) 

{f+I)l 



5 9 19 

2 X 10”^° 2 X 10"^® 2 X 



Thus, the dumb method appears to be feasible, in that it can deliver a failure 
probability smaller than 10“^® with a t=9-error-correcting code. If it is coupled 
with the trick of having the offset o vary from sector to sector, then it appears 
to offer a cheap and watertight runlength limiting method, even with a weaker 
outer code: on the rare occasions when more than, say, 5 bits need to be flipped, 
we move the data to another sector; this emergency procedure would be used 
of order once in every billion writes. The only cost of this method is a slight 
increase in the effective noise level at the decoder. 



3 Discussion 

In an actual implementation it would be a good idea to compute the weight 
enumerator function of the dual code defined by F and ensure that it has the 
largest possible minimum distance. 

The ideas in this paper can be glued together in several ways. 

— Special bits: various values of K,- can be used, including = Q {i.e., use the 
nonlinear method alone). The experiments suggest that increasing beyond 
K,. = 20 or 30 gives negligible decrease in the probability of conflict. 

— Nonlinear bitflipping. This feature could be on or off. 

— Variable offset vector o. If the offset vector can be pseudorandomly altered, 
it is very easy to cope with rare cases where either of the above methods 
fails. 

If the variable offset vector is available, then either of the two ideas - the lin- 
ear method or the nonlinear method - should work fine by itself. Otherwise, a 
belt-and-braces approach may be best, using the linear and nonlinear methods 
together. 
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Abstract. We propose a method to determine the critical noise level for 
decoding Gallager type low density parity check error correcting codes. 
The method is based on the magnetization enumerator {M), rather than 
on the weight enumerator (W) presented recently in the information 
theory literature. The interpretation of our method is appealingly simple, 
and the relation between the different decoding schemes such as typical 
pairs decoding, MAP, and finite temperature decoding (MPM) becomes 
clear. Our results are more optimistic than those derived via the methods 
of information theory and are in excellent agreement with recent results 
from another statistical physics approach. 



1 Introduction 

Triggered by active investigations on error correcting codes in both of informa- 
tion theory (IT) [1,2,3] and statistical physics (SP) [4,5] communities, there is a 
growing interest in the relationship between IT and SP. As the two communities 
investigate similar problems, one may expect that standard techniques known 
in one framework would bring about new developments in the other, and vice 
versa. Here we present a direct SP method to determine the critical noise level 
for Gallager type low density parity check codes which allows us to focus on the 
differences between the various decoding criteria and their approach for defin- 
ing the critical noise level for which decoding, using Low Density Parity Check 
(LDPC) codes, is theoretically feasible. 

2 Gallager Code 

In a general scenario, the N dimensional Boolean message s° G {0, 1}^ is en- 
coded to the M{> N) dimensional Boolean vector t°, and transmitted via a noisy 
channel, which is taken here to be a Binary Symmetric Channel (BSC) character- 
ized by an independent flip probability p per bit; other transmission channels may 
also be examined within a similar framework. At the other end of the channel, the 
corrupted codeword is decoded utilizing the structured codeword redundancy. 

The error correcting code that we focus on here is Gallager’s linear code [6]. 
Gallager’s code is a low density parity check code defined by the a binary (M— 
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N)xM matrix A = [C 1 IC 2 ], concatenating two very sparse matrices known to 
both sender and receiver, with the {M—N)x{M—N) matrix C 2 being invertible. 
The matrix A has K non-zero elements per row and C per column, and the 
code rate is given by R= 1—CjK = 1—N/M. Encoding refers to multiplying the 
original message s° with the (MxN) matrix (where G= yielding 

the transmitted vector t°. Note that all operations are carried out in (mod 2) 
arithmetic. Upon sending t° through the binary symmetric channel (BSC) with 
noise level p, the vector r = t°+n° is received, where n° is the true noise. 

Decoding is carried out by multiplying r by A to produce the syndrome 
vector z = Ar (= An°, since AG^ = 0). In order to reconstruct the original 
message s°, one has to obtain an estimate n for the true noise n° . First we select 
all n that satisfy the parity checks An = An°: 

Ipc(A,n°) = {n I An = 2 ;}, and Ip^(A, n°) = {n € Ipc(A, n°)|n n°}, (1) 

the (restricted) parity check set. Any general decoding scheme then consists 
of selecting a vector n* from Ipc(A,n°) on the basis of some noise statistics 
criterion. Upon successful decoding n° will be selected, while a decoding error 
is declared when a vector n* G Ip(.(A, n°) is selected. An measure for the error 
probability is usually defined in the information theory literature [7] as 

Pe{p) = (A ( 3 n G Ipc(A, n°) : w{n) < w{n°) \ n°) , (2) 

where A(-) is an indicator function returning 1 if there exists a vector n G 
Ip(,(A, n°) with lower weight than that of the given noise vector n° . The weight 

of a vector is the average sum of its components w(ri) = ^ To obtain 

the error probability, one averages the indicator function over all n° vectors 
drawn from some distribution and the code ensemble A as denoted by (.)a n.-’- 

Carrying out averages over the indicator function is difficult, and the error 
probability (2) is therefore upper-bounded by averaging over the number of vec- 
tors n obeying the weight condition w{n) > w{n°). Alternatively, one can find 
the average number of vectors with a given weight value w from which one can 
construct a complete weight distribution of noise vectors n in Ip^(A,n°). From 
this distribution one can, in principle, calculate a bound for and derive critical 
noise values above which successful decoding cannot be carried out. 

A natural and direct measure for the average number of states is the entropy 
of a system under the restrictions described above, that can be calculated via 
the methods of statistical physics. 

It was previously shown (see e.g. [4] for technical details) that this prob- 
lem can be cast into a statistical mechanics formulation, by replacing the field 
({0, 1}, 3-mod(2)) by ({1, —1}, x), and by adapting the parity checks correspond- 
ingly. The statistics of a noise vector n is now described by its magnetization 
m{n) = (w(n) G [1,-1]), which is inversely linked to the vector 

weight in the [0, 1] representation. With this in mind, we introduce the con- 
ditioned magnetization enumerator, for a given code and noise, measuring the 
noise vector magnetization distribution in Ip^{A,n°) 
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MA,no(m) = 



Tr S(m(n)—m) 



To obtain the magnetization enumerator A4(m) 
M(m) = ( MA.no(m) ) 

\ ' A.n° 



( 3 ) 

( 4 ) 



which is the entropy of the noise vectors in n^) with a given m, one carries 

out uniform explicit averages over all codes A with given parameters K, C, and 
weighted average over all possible noise vectors generated by the BSC, i.e., 



M 



((1-p) S{n°-1) +p 6{n° + l)) . 
i=i 



( 5 ) 



It is important to note that, in calculating the entropy, the average quantity 
of interest is the magnetization enumerator rather than the actual number of 
states. For physicists, this is the natural way to carry out the averages due to 
three main reasons: a) The entropy obtained in this way is believed to be self- 
averaging, i.e., its average value (over the disorder) coincides with its typical 
value, b) This quantity is extensive and grows linearly with the system size, c) 
This averaging distinguishes between annealed variables that are averaged or 
summed for a given set of quenched variables, that are averaged over later on. 
In this particular case, summation over all n vectors is carried for a fixed choice 
of code A and noise vector n°; averages over these variables are carried out at 
the next level. 

One should point out that in somewhat similar calculations, we showed that 
this method of carrying out the averages provides more accurate results in com- 
parison to averaging over both sets of variables simultaneously [8]. 

A positive magnetization enumerator, A4(m) > 0 indicates that there is an 
exponential number of solutions (in M) with magnetization m, for typically 
chosen A and n°, while Ai(m) — >■ 0 indicates that this number vanishes as 
M — >■ oo (note that negative entropy is unphysical in discrete systems) . 

Another important indicator for successful decoding is the overlap lo be- 
tween the selected estimate n*, and the true noise n°: uj{n, n°) = ^ njn°j, 

{oj{n,n°) G [—1, 1]), with w = 1 for successful (perfect) decoding. However, this 
quantity cannot be used for decoding as n° is unknown to the receiver. The 
(code and noise dependent) overlap enumerator is now defined as: 



WA.n°(w) = 



nel' (A,ti°) 



and the average quantity being 

W(w) = ( WA.r.o(u;)) 

\ ! A.n’ 



(6) 



( 7 ) 
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This measure is directly linked to the weight enumerator [3]), although according 
to our notation, averages are carried out distinguishing between annealed and 
quenched variables unlike the common definition in the IT literature. However, 
as we will show below, the two types of averages provide identical results in this 
particular case. 



3 The Statistical Physics Approach 



Quantities of the type Q{c) = {Qy{c))y, with Qy(c) = ^\n[Zy{c)\ and 
Zy{c) = Ttx S{c{x,y) — Me), are very common in the SP of disordered sys- 
tems; the macroscopic order parameter c{x,y) is fixed to a specific value and 
may depend both on the disorder y and on the microscopic variables x. Al- 
though we will not prove this here, such a quantity is generally believed to 
be self-averaging in the large system limit, i.e., obeying a probability distribu- 
tion P {Qy{c)) = S{Qy{c) — Q(c))). The direct calculation of Q(c) is known as 
a quenched average over the disorder, but is typically hard to carry out and 
requires using the replica method [9]. The replica method makes use of the 
identity (InZ) = ( lim„_>o[-Z" — l]/n ), by calculating averages over a product 
of partition function replicas. Employing assumptions about replica symmetries 
and analytically continuing the variable n to zero, one obtains solutions which 
enable one to determine the state of the system. 

To simplify the calculation, one often employs the so-called annealed approxi- 
mation, which consists of performing an average over Qy{c) first, followed by the 
logarithm operation. This avoids the replica method and provides (through the 
convexity of the logarithm function) an upper bound to the quenched quantity: 



Qa{c) = ^\n[{Zy{c))y] > Qq{c)= ^{\n[Zy{c)])y = \^ 



(Z" c ) -1 
^ ^ . (8) 
nM ^ ’ 



The full technical details of the calculation will be presented elsewhere, and 
those of a very similar calculation can be found in e.g. [4]. It turns out that it 
is useful to perform the gauge transformation nj ^njn°, such that the averages 
over the code A and noise n° can be separated, WA,n° becomes independent of 
n°, leading to an equality between the quenched and annealed results, W(m) = 
A4a(m)|p=o = AIg(m)|p=o- For any finite noise value p one should multiply 
exp[W(w)] by the probability that a state obeys all parity checks exp[— ^( 0 ;,^)] 
given an overlap to and a noise level p [3]. In calculating yV(w) and A4a/q(jn), 
the 5-functions fixing m and w, are enforced by introducing Lagrange multipliers 
m and a). 

Carrying out the averages explicitly one then employs the saddle point 
method to extremize the averaged quantity with respect to the parameters in- 
troduced while carrying out the calculation. These lead, in both quenched and 
annealed calculations, to a set of saddle point equations that are solved either an- 
alytically or numerically to obtain the final expression for the averaged quantity 
(entropy) . 
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The final expressions for the annealed entropy, under both overlap (oj) and 
magnetization (m) constraints, are of the form: 

= -^(ln(2) + (K-l)ln(l + qf))+ln/ Tr 

A' \"=±i /„o 

—ujuj — mm , (9) 

where qi has to be obtained from the saddle point equation = 0. Similarly, 
the final expression in the quenched calculation, employing the simplest replica 
symmetry assumption [9], is of the form: 



Q^= -C'Jdxdx ln[l+a:a-] + ^ y^| 



+ n dXc^{Xc) (in 



tc=l 

—LULU — mm . 



Tr exp(n(cj+mn^)) 1 f(l+nxc) 

n=±l 



( 10 ) 



The probability distributions tt{x) and tt(x) emerge from the calculation; the 
former represents a probability distribution with respect to the noise vector local 
magnetization [10], while the latter relates to a field of conjugate variables which 
emerge from the introduction of J-functions while carrying out the averages (for 
details see [4]). Their explicit forms are obtained from the functional saddle 
point equations = 0; and all integrals are from — 1 to 1. Enforcing 

a (i-function corresponds to taking 6j,m such that ^ = 0, while not 

enforcing it corresponds to putting a), rh to 0. Since to, m follow from = 0, 

= 0, all the relevant quantities can be recovered with appropriate choices 
of Qj, m. 



4 Qualitative Picture 

We now discuss the qualitative behaviour of A4{m), and the interpretation of 
the various decoding schemes. To obtain separate results for A4{m) and W(m) 
we calculate the results of Eqs.(9) and (10), corresponding to the annealed and 
quenched cases respectively, setting a) = 0 for obtaining Ai{m) and rh = 0 
for obtaining >V(w) (that becomes W(m) after gauging). In Fig. 1, we have 
qualitatively plotted the resulting function Ai(m) for relevant values of p. A4(m) 
(solid line) only takes positive values in the interval [m^{p) , m+{p)]; for even K, 
M(m) is an even function of m and rrUj}) = — nq_(p). The maximum value of 
A4{m) is always (1 — i?)ln(2). The true noise n° has (with probability 1) the 
typical magnetization of the BSC: m(n°) =mo(p) = 1 — 2p (dashed-dotted line). 

The various decoding schemes can be summarized as follows: 

— Maximum likelihood (MAP) decoding - minimizes the block error 
probability [11] and consists of selecting the n from Ipc(A,n°) with the 
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a) p<Pc 



-1 




Fig. 1. The qualitative picture of M(m) > 0 (solid lines) for different values of p. 
For MAP, MPM and typical set decoding, only the relative values of m+{p) and mo{p) 
determine the critical noise level. Dashed lines correspond to the energy contribution 
of —I3F at Nishimori’s condition [j3 = 1). The states with the lowest free energy are 
indicated with •. a) Sub-critical noise levels p<pc, where rn+(p) < mo{p), there are 
no solutions with higher magnetization than mo(p), and the correct solution has the 
lowest free energy, b) Critical noise level p = pc, where m+(p) = mo(p)- The minimum 
of the free energy of the sub-optimal solutions is equal to that of the correct solution 
at Nishimori’s condition, c) Over-critical noise levels p>Pc where many solutions have 
a higher magnetization than the true typical one. The minimum of the free energy of 
the sub-optimal solutions is lower than that of the correct solution. 



highest magnetization. Since the probability of error below m^(p) vanishes, 
P{3n G Ip(. : m(n) > m^(p)) = 0, and since P{m{n°) = mo{p)) = 1, the 
critical noise level Pc is determined by the condition m^{pc) =mo{pc). The 
selection process is explained in Fig.l(a)-(c). 

— Typical pairs decoding - is based on randomly selecting a n from Ipc 
with m{n) = mo{p) [3]; an error is declared when 'nP is not the only element 
of Ipc- For the same reason as above, the critical noise level Pc is determined 
by the condition m+{pc) = rno{pc)- 

— Finite temperature (MPM) decoding - An energy —Fm{n) (with F = 
f ln(^!^)) according to Nishimori’s condition^ is attributed to each n G Ipc, 

^ This condition corresponds to the selection of an accurate prior within the Bayesian 
framework. 
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and a solution is chosen from those with the magnetization that minimizes 
the free energy [4]. This procedure is known to minimize the bit error proba- 
bility [11]. Using the thermodynamic relation T = lA—^S, (3 being the inverse 
temperature (Nishimori’s condition corresponds to setting (3=1), the free 
energy of the sub-optimal solutions is given by T{rn) = —Frn—^M{rn) (for 
M{m) >0), while that of the correct solution is given by —Fmo{p) (its en- 
tropy being 0). The selection process is explained graphically in Fig.l(a)-(c). 
The free energy differences between sub-optimal solutions relative to that of 
the correct solution in the current plots, are given by the orthogonal distance 
between M{m) and the line with slope —f3F through the point (too(p),0). 
Solutions with a magnetization m for which j\4(m) lies above this line, have 
a lower free energy, while those for which JH(m) lies below, have a higher 
free energy. Since negative entropy values are unphysical in discrete sys- 
tems, only sub-optimal solutions with Ai{m)>0 are considered. The lowest 
p value for which there are sub-optimal solutions with a free energy equal 
to —Fmo{p) is the critical noise level Pc for MPM decoding. In fact, using 
the convexity of M{m) and Nishimori’s condition, one can show that the 
slope dA4{m)/dm > —f3F for any value m < mo{p) and any p, and equals 
— /3F only at m = rrio{p)', therefore, the critical noise level for MPM decoding 
p=Pc is identical to that of MAP, in agreement with results obtained in the 
information theory community [12]. 

The statistical physics interpretation of finite temperature decoding corre- 
sponds to making the specific choice for the Lagrange multiplier rh = (3F and 
considering the free energy instead of the entropy. In earlier work on MPM 
decoding in the SP framework [4], negative entropy values were treated by 
adopting different replica symmetry assumptions, which effectively result 
in changing the inverse temperature, i.e., the Lagrange multiplier rh. This 
effectively sets m = rrq_(p), i.e. to the highest value with non-negative en- 
tropy. The sub-optimal states with the lowest free energy are then those 
with m = m+(j)). 



The central point in all decoding schemes, is to select the correct solution only 
on the basis of its magnetization. As long as there are no sub-optimal solutions 
with the same magnetization, this is in principle possible. As shown here, all 
three decoding schemes discussed above, manage to do so. To find whether at a 
given p there exists a gap between the magnetization of the correct solution and 
that of the nearest sub-optimal solution, just requires plotting A4(m)(>0) and 
TOo(p), thus allowing a graphical determination of Pc- Since MPM decoding is 
done at Nishimori’s temperature, the simplest replica symmetry assumption is 
sufficient to describe the thermodynamically dominant state [9] . At Pc the states 
with m^(pc)=mo{pc) are thermodynamically dominant, and the Pc values that 
we obtain under this assumption are exact. 
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5 Critical Noise Level - Results 

Some general comments can be made about the critical MAP (or typical set) 
values obtained via the annealed and quenched calculations. Since Aiq{m) < 
Maim) (for given values of K, C and p), we can derive the general inequality 
Pc,q > Pc, a- For all K, C values that we have numerically analyzed, for both 
annealed and quenched cases, m+{p) is a non increasing function of p, and Pc 
is unique. The estimates of the critical noise levels Pc,ajqi based on Majq, are 
obtained by numerically calculating mi.,a/q(p)j and by determining their inter- 
section with mo{p). This is explained graphically in Fig. 2(a). As the results for 
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Fig. 2. a) Determining the critical noise levels Pc,a/q based on the function Ma/q, 
a qualitative picture, b) Comparison of different critical noise level (pc) estimates. 
Typical set decoding estimates have been obtained via the methods of IT [3] , based on 
having a unique solution to W (m) — K {m, Pc) , as well as using the methods of SP [14]. 
The numerical precision is up to the last digit for the current method. Shannon’s limit 
denotes the highest theoretically achievable critical noise level Pc for any code [15]. 



MPM decoding have already been presented elsewhere [13], we will now concen- 
trate on the critical results Pc obtained for typical set and MAP decoding; these 
are presented in Fig. 2(b), showing the values of Pc,a/q for various choices of K 
and C are compared with those reported in the literature. 

From the table it is clear that the annealed approximation gives a much more 
pessimistic estimate for pc- This is due to the fact that it overestimates A4 in 
the following way. Aiaini) describes the combined entropy of n and n° as if n° 
were thermal variables as well. Therefore, exponentially rare events for n° (i.e. 
m{n°)^mo{p)) still may carry positive entropy due to the addition of a positive 
entropy term from n. In a separate study [14] these effects have been taken care 
of by the introduction of an extra exponent; this is not necessary in the current 
formalism as the quenched calculation automatically suppresses such contribu- 
tions. The similarity between the results reported here and those obtained in [8] 
is not surprising as the equations obtained in quenched calculations are similar 
to those obtained by averaging the upper-bound to the reliability exponent using 
a methods presented originally by Gallager [6]. Numerical differences between 
the two sets of results are probably due to the higher numerical precision here. 
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6 Conclusions 

To summarize, we have shown that the magnetization enumerator Ai{m) plays a 
central role in determining the achievable critical noise level for various decoding 
schemes. The formalism based on the magnetization enumerator Ai offers a 
intuitively simple alternative to the weight enumerator formalism as used in 
typical pairs decoding [3,14], but requires invoking the replica method given 
the very low critical values obtained by the annealed approximation calculation. 
Although we have concentrated here on the critical noise level for the BSC, both 
other channels and other quantities can also be treated in our formalism. The 
predictions for the critical noise level are more optimistic than those reported 
in the IT literature, and are up to numerical precision in agreement with those 
reported in [14]. Finally, we have shown that the critical noise levels for typical 
pairs, MAP and MPM decoding must coincide, and we have provided an intuitive 
explanation to the difference between MAP and MPM decoding. 

Support by Grants-in-aid, MEXT (13680400) and JSPS (YK), The Royal Society and 
EPSRC-GR/N00562 (DS/JvM) is acknowledged. 
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Abstract. The performance of a new method for decoding binary error 
correcting codes is presented, and compared with established hard and soft 
decision decoding methods. The new method uses a modified form of the max- 
sum algorithm, which is applied to a split (partially disconnected) modification 
of the Tanner graph of the code. Most useful codes have Tanner graphs that 
contain cycles, so the aim of the split is to convert the graph into a tree graph. 
Various split graph configurations have been investigated, the best of which 
have decoding performances close to maximum likelihood. 



1. Introduction 

This paper describes some new results arising from a continuing investigation [1, 2, 3] 
into the graph decoding of binary linear block error-correcting codes. Graph 
algorithms have been applied very successfully to the decoding of long and powerful 
codes, such as the low density parity check (LDPC) codes [4], where performances 
comparable to that of turbo codes [5] have been achieved. In the case of short or 
medium length codes, however, the existence of cycles or loops in a code graph leads 
to difficulties in applying the various decoding algorithms, because of instability or 
lack of convergence. Unfortunately, almost all the codes of interest (i.e., with 
optimum or near-optimum parameters), whatever their length, have graphs containing 
cycles [6]. In the case of long codes the problem does not arise, as the algorithms are 
stable and converge, for reasons which are still not entirely clear. In the case of short 
and medium length codes it is necessary to find ways of modifying the code graph 
and/or the decoding algorithm, so as to reduce or remove cycles and thus achieve the 
potential simplicity and performance of graph decoding. In [1] a method of splitting 
(partially disconnecting) a code graph so as to remove cycles was introduced, and 
further investigated in [2]. Subsequently this method was studied [3] to evaluate its 
performance and the effect of reconfiguring the code graph in various different ways. 
The main results of the study are presented in this paper. The fully disconnected sub- 
graph configuration has also been independently studied [7], with similar results 
being obtained. 
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2. The Code Graph 

2.1 Full Graph 

The Tanner [8] or factor graph of an (n,k) code (block length n and n - k parity 
checks) is a bipartite graph representing the parity check relationships of the code. 
Each position (bit) of a codeword in the code is represented by one of a first set of n 
nodes in the graph. All the position nodes in a parity relationship are joined by edges 
to one of a second set of n - k nodes called parity nodes. If two position nodes are 
connected to the same pair of check nodes then there is a cycle or loop in the graph. 
There will be many such cycles in the graph of a useful code. 

Each parity check relationship of a code is represented by a row in the parity check 
matrix of the code, and each non-zero symbol in the row corresponds to an edge in the 
graph. Minimising the number of edges connected to each parity node minimises the 
number of computations required in the decoding algorithm [3, 9], and may also 
reduce the number of cycles in the graph, sometimes removing them entirely [9]. 
Therefore, since a code can be specified by a number of equivalent parity check 
matrices, it can be important to find the minimum weight matrix from which to 
construct the graph. This minimum weight parity matrix corresponds to the minimum 
span (trellis oriented) form of the generator matrix of the code [10], in the case of 
several simple codes. We conjecture that this will always be the case, but do not have 
a general proof at present. If the graph corresponding to the minimum weigh parity 
matrix contains cycles, or if a graph with cycles and more than the minimum number 
of edges is used for other reasons, then in the case of short and medium length codes 
it is necessary to modify either the graph or the algorithm, or both, before it can be 
decoded effectively. 

2.2 Split Graph 

The graph modification introduced in [1, 2] and studied in [3] splits the graph so as to 
break all the cycles, thus creating a tree graph. This is done by disconnecting an 
appropriate position node in each cycle into two sub-nodes (or more if more than one 
cycle is being simultaneously split), so that one or more position nodes will appear as 
two or more sub-nodes in the modified graph. The number of edges remains the same, 
but the number of nodes increases. Clearly, there are a number of ways of splitting a 
given graph. So-called line, star and decomposed configurations [3,7] were found 
interesting, although the research reported in [3] did not investigate the decomposed 
configuration. The first two configurations generate a connected tree graph, and the 
third a set of n - k disconnected tree sub-graphs. Eor example. Figure 1 shows the 
Tanner graph for the Hamming (7,4,3) code (block length n = 7, dimension k = 4 and 
minimum distance d = 3) with minimum weight parity check matrix 

0101101 
[H] = 10 0 10 11 

1010101 
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together with line, star and decomposed split graph configurations, where the split 
nodes are identified by a pair of numbers. In addition, a number of different parity 
check matrices have been investigated for each code. These different matrices, some 
minimum weight and others not, all correspond to equivalent codes (ie, codes with the 
same (n,k,d) parameters). 




Fig. 1. Full and split graphs for the (7,4,3) code 

(a) : Tanner graph 

(b) : line configuration 

(c) : star configuration 

(d) : decomposed configuration 



3. The Decoding Algorithm 

The max-sum algorithm, or its equivalent the min-difference algorithm (as the code is 
binary) [2,11], has been used. As there are no cycles in the split graph, the algorithm 
can be applied in the normal manner [2, 3, 11], the computations flowing in from the 
leaf nodes and out again, but with a modification as to how the metrics at the split 
position nodes are handled. If a node has been split into s sub-nodes, then the initial 
metric value for this node is divided by s to determine the initial metric value for each 
sub-node, since otherwise the metrics of these nodes will tend to dominate in the 
computations. After completion of the algorithm computation, then the final metric 
value of the split position node is the sum of the s sub-node final metric values. In 
addition, it is necessary to iterate the algorithm several times until the final metric 
values converge. As before, at each iteration of the algorithm the new initial metrics 
for the s sub-nodes of a split node are determined by dividing the previous final 
metric at that node by s. This effectively averages the sub-node metric values before 
each iteration. In the case of the disconnected configuration, each iteration of the 
modified algorithm is applied to the n - k sub-graphs simultaneously. 
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4. Performance Results 

A number of computer simulations were run to assess the performance of several 
relatively simple binary error-correcting codes using the split graph decoding 
technique. 



4.1 The (7,4,3) Code 

Figure 2 shows the performance in additive white Gaussian noise (AWGN) of the 
Hamming (7,4,3) code, as plots of bit error rate (log (BER)) versus energy per 
information bit to noise spectral density ratio (Eb/No in dB). The figure allows the 
performance of hard-decision (minimum distance) decoding, soft-decision (Viterbi 
algorithm (VA) maximum likelihood trellis) decoding and split graph decoding to be 
compared. The minimum weight parity check matrix given above in section 2.2 was 
used for the split graph curve, and convergence was achieved with 5 iterations, using 
the line configuration in Eigure 1(b) as the structure of the split graph. There was 
almost no difference in the performance of the code when using other minimum 
weight [H] matrices with line configurations, but using star configurations slightly 
degraded the performance (by about a tenth of a dB at an error rate of 10'^). 




4.2 The (9,5,3) Code 

Eigure 3 shows the performance in AWGN of the (9,5,3) generalised array code 
(GAG) [1]. Again, hard-decision, soft-decision (VA) and split graph decoding is 
compared. The non-minimum weight matrix 
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010110101 
[H] = 0 110 0 110 1 
101001011 
100101101 



was used, with the line configuration of Figure 4, and 7 iterations were required. As 
with the (7,4,3) code, the best star configuration had slightly degraded performance 
compared to the best line configuration . 




Fig. 3. Performance of the (9,5,3) code 



4.3 The (16,11,4) Code 

Figure 5 shows the performance in AWGN of the (16,11,4) extended Flamming code, 
comparing the hard-decision, soft-decision and split graph results. The following [FI] 
matrix was used: 



0101010101010101 
[H]= 1100110000110011 
0011001100110011 
1001100110011001 
0011110000111100 
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with the line configuration of Figure 6, which gave the best results of all the line 
configurations investigated. 7 iterations were required. Star configurations were not 
investigated for this code. 




4.4 Other Codes 

In addition to the above codes, the (15,7,5) and (16,5,8) codes were studied. 
Unfortunately it was not possible to validate the results obtained, and it is conjectured 
that the parity check matrices used inadvertently contained mistakes. 



5. Conclusions 

The results for the (7,4,3) and (16,1 1,4) codes are very close (within a few tenths of a 
dB at BER = 10'^) to their maximum likelihood (ML) performance. At very low error 
rates the split graph decoding algorithm, because it is an optimum symbol decoding 
algorithm, does better than the ML (soft decision) algorithm. The good performance 
of the split graph decoding algorithm for these codes is most probably because a 
minimum weight parity check matrix was used (in the case of the (16,11,4) code this 
needs to be confirmed), and because an effective split graph configuration was found. 
The performance of the (9,5,3) code was not as good (about 0.7 dB away from ML at 
BER = 10'^), probably because a non-minimum weight [H] matrix was used. 
Therefore it seems to be important to use a minimum weight parity check matrix, and 
to find an optimum split graph configuration. The balanced symmetrical structures of 
the split graphs for the (7,4,3) and (16.11.4) codes, as against the less symmetrical 
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graph for the 9,5,3) code, may be a clue here. Except in the case of the (7,4,3) code, 
where the matrix and configuration options are limited, different matrices and 
configurations lead to a wide range of performance results. The best star configuration 
results (where available) were slightly (about O.ldB) worse than the best line 
configuration results, however. 





Fig. 4. Line configuration split Fig. 6. Line configuration graph for the 

(9,5,3) code split graph for the (16,1 1,4) code 



There are no results in this investigation [3] for the decomposed split graph 
configuration, because at the time it was thought that this configuration could not 
perform as well as the others. In [7], however, the authors obtained virtually similar 
results for star and decomposed configurations of split graphs for the (8,4,4) and 
(10,6,3) codes, with performances very close to those presented here. They also 
suggest that an approximate bound on the number of iterations required for 
convergence is the number of edges on the shortest path between the two nodes 
furthest apart in the Tanner graph before it is split, which agrees with our own 
observations. This welcome verification of our results confirms the potential value of 
the split graph decoding technique, and supports our conjecture that incorrect parity 
check matrices were used in the simulations for the (15,7,5) and (16,5,8) codes. 

Further research is now needed to correct the mistakes in the simulations for the 
(15,7,5) and (16,5,8) codes, to find or confirm minimum weight [H] matrices and 
optimum splitting configurations for all the codes investigated so far, to prove the 
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conjecture that the minimum span generator matrix of a code corresponds to the 
minimum weight parity check matrix, to investigate the performance of longer and 
more powerful codes using this decoding technique, and to assess the relative 
computational complexity of this technique compared to other effective decoding 
methods. 
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Abstract. In this paper, line coding schemes and their application to the multi- 
user adder channel are investigated. The focus is on designing line codes with 
higher information per channel use rates than time- sharing. We show that by 
combining short multi-user line codes, it is possible to devise longer coding 
schemes with rate sums which increase quite rapidly at each iteration of the 
construction. Asymptotically, there is no penalty in requiring the coding 
schemes to be DC-free. 



1 Introduction 

The multiple access adder channel (MAAC) is based on a channel model which 
permits simultaneous transmission hy two or more users in the same bandwidth 
without sub-division in time, frequency or the use of orthogonal codes. This can be 
achieved by the use of a multi-user coding scheme (sometimes called a superimposed 
scheme) on the adder channel, and also leads to a significantly larger capacity. This 
results from the higher combined information rate sum compared, say, to time- 
sharing; and hence leads to a potentially more efficient system. A block diagram of a 
two-user (M=2) scheme is shown in Figure 1, where the inputs and their associated 
sources have independent encoders and a single decoder estimates their combined 
output. 




Fig. 1. 2-user Multi Access Scheme 

A number of multi-user coding schemes for the MAAC have been constructed [1-4], 
and a useful survey of the literature on these schemes before 1980 is given in [5]. 
Potential practical applications of multi-user coding schemes have been investigated 
for local area networks (LANs), using a baseband MAAC model [6]; and for mobile 
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radio systems, using a pass band MAAC model [7], In the former case, it is 
advantageous to have a DC-free multi-user line coding scheme, as was first explored 
in [8]. The penalty for this is a fall in the combined information rate; however, it is 
shown in this paper that by using a code combining technique known as the “direct 
sum code construction” [9], this rate penalty can be reduced almost entirely as n 
increases. The advantage of this technique can be highlighted when compared to 
binary multi-user coding schemes -which are not DC-free- where the rate sum is 
always constant when combining codes. 

The paper is organised as follows: In section 2, line codes and their properties are 
introduced together with an example of a 2-user coding scheme on the adder channel. 
Section 3 describes the code combining technique that would enable the user codes to 
achieve a higher rate than one information bit per channel use (time-sharing). Trellis 
construction for some line codes are constructed in section 4 then conclusions and 
further work are suggested in section 5. 



2 Line Codes and Their Properties 

It is usually the case that a communication link cannot transmit DC components. An 
example of this situation is the telephone network where AC coupling is provided by 
transformers. Further in base band transmissions systems, errors may occur in the 
channel due to interference and noise. Line coding is concerned with providing error 
control coding for base band transmission, together with reflecting the requirements 
of its medium [10]. Various line codes have been designed to meet a number of 
transmission requirements. In choosing a line-coding scheme for base band 
transmission, the following parameters are considered: spectral characteristics, bit 
synchronisation, error detection, bandwidth compression and noise immunity. For 
instance, the Manchester Code (MC) is a popular code since it is DC-free and is self- 
clocking. This code as well as the Coded Mark Inversion (CMI) scheme, which will 
be shown to form a two-user uniquely decodable scheme, are introduced and 
described in this section. The MC and CMI codes are widely used for high speed data 
transmission such as: TF-34, F-32 and CIT [11-12]. In a MC, a low-to-high level (- 
11) transformation during the symbol interval T, indicates a logical-zero at the 
encoder input, while a high-to-low transformation (1-1) indicates a logical one. This 
code is also known as a self-synchronising code, since the data and the clock are 
combined. Each encoded bit contains a transition at the midpoint of a bit period as is 
shown in Figure 2 




Fig. 2. Signal Waveform of the Manchester Code 
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In the CMI coding sequence (see Figure 3), input data O’s are encoded by (-11). 
On the other hand, input data 1 ’s are encoded with opposite polarity as (-1-1) and (11) 
alternately. 




Fig. 3. Signal Waveform of the CMI Code 

An effective way of achieving DC-freedom on the base band adder channel is to 
ensure that all the component codes are themselves DC-free, since then their sum also 
will be DC-free [8]. This can be done by using a balanced set of code symbols ( eg, 
for q=2 {0, 1} maps to {1, -1}, for q=3 {0, 1, 2} maps to {1, 0, -1}), (where q is the 
number of encoder output levels from each user), and ensuring that all codewords 
either have zero disparity (ie, the algebraic sum of the codewords symbols is zero) or 
exist in opposite disparity pairs which are used alternately. These mappings and 
pairings are standard line coding techniques [13]. The encoding table (Table 1) - 
which will also be considered as Example 1- represents the composite codeword 
outputs when applied to the base band adder channel. This code is a mapping of a 2- 
user binary scheme presented in [1]. The original code from [1], C^=|00,11} and 
C 2 =( 00 , 01 , 10 }, has a rate sum R^^=\.292 information bits per channel use, which 
reduces to 1.0 when line coding is introduced. The reason behind the reduction in the 
rate sum is that binary words 1 1 and -1-1 must be used alternately in order to keep the 
disparity at zero and as a result are considered to be a single effective codeword. It is 
interesting to note that the CMI code can be represented as a binary convolutional 
code where the encoder can be described as having two states each defined by the 
sign of the running digital sum (RDS) [14]. Trellis constructions for these and other 
codes are dealt with in section 4. 

Table 1. Output from a 2-user Line Coding Scheme 



User2\ Useri 


-11 


1-1 


-11 


-22 


00 


11 


02 


20 


-1-1 


-20 


0-2 
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3 Construction of Multi-user Line Codes with Higher Rates 

In this section, a code combining method known as the Direct Sum Construction is 
described with examples. The direct sum technique has been used previously by 
Chang [15] to construct multi-user coding schemes for the MAAC. He called it 
concatenation, and noted the advantage of being able to form longer multi-user codes 
whilst maintaining the same overall rate. Chang was not concerned with the DC-free 
case, however, and so did not note the increase in rate that occurs when constructing 
longer DC-free multi-user coding schemes using this method. 



3.1 Definition 

Given two block codes C/,, with lengths n^ and nt, having N^, and Nj, codewords 
respectively, their direct sum consists of all codewords |m I v| where ue and 

vG Cj [8]. This code (and similarly the code consisting |v I m| ) has length Ua+ 
Uh and codewords. If both Ca and C* are DC-free, then and are also 

DC-free. If { Ca , Cj, } form a 2-user coding scheme on the adder channel, then so do 
the pairs of codes and (cross direct sum), and and (self direct sum). 



3.2 Examples 

Applying the self direct sum construction to the following scheme: 

Example 2: 

Cji = (1-11-1, 1-1-11, -111-1, -11-11} 

C,, = (1-11-1, 11-1-1,-1-111, 1-111/1-1-1-1, 111-1/-1-11-1,1111/-1-1-1-1} 

which have 4 and 6 effective codewords respectively. Note that Cjj consists of 
balanced (zero disparity) words only. Now =0.5 and = 0.646 so 
1146 which is greater than the rate sum R 12 =1.0 of the Cy, C 2 scheme. The 
reason for the higher rate is that codewords with opposite disparities in the original 
shorter codes form codewords with zero disparity in the longer direct sum codewords. 
Thus complementary pairs forming single effective codewords in C 2 become two 
effective codewords in the direct sum code C 22 . This does not occur in Cu, as Cj is 
already all balanced, so Rj]=Rj=Q.5. The direct sum process can be iterated to form 
longer and longer coding schemes with increasing rate R 22 and overall rate Ru 22 , as 
shown in Table 2 for this code. In the limit as n increases, the overall rate is bounded 
by the overall rate of the original code when regarded as a non-DC-free code; ie, 
counting all codewords as effective. Thus for Example 2, the asymptotic rate is 1 .292, 
and therefore in the limit there is no penalty in requiring the scheme to be DC -free. 
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Example 3: 

Another DC-free uniquely decodable 2-user coding scheme (Example 3) has «=2, 
fl/=2, <72=3, 0=5 and component codes 
Cy={l-1, -11} and 

C2={00, 1-1,01/0-1, 10/-10, 11/-1-1}. 

Here, Rj=0.5, because Cy is a binary code and /?2=0.732, as C2 is a ternary code which 
has 5 effective codewords. This scheme has R]2=l.232 bits/channel use, the highest 
rate for the parameters concerned. The iterated rates R 22 and overall rates R 1122 for the 
direct summing of the 2-user scheme of this example are also shown in Table2. Here 
R]i=R] remains constant at 0.5, and R 22 > R 2 rapidly increases. The asymptotic limit 
on R 1122 is given by 1.446 information bits per channel use. In practice, this limit 
seems to be approached fairly rapidly and It is interesting to note that this asymptotic 
rate is quite close to that of the best 2-user binary adder channel coding schemes 
found so far [3-4]. 



Table 2. Rate of 2-user DC-free Schemes 



Length 


Example 1 


Example 2 


n 


R22 


R1122 


R22 


R1122 


2 


0.5 


1.0 


0.733 


1.233 


4 


0.646 


1.146 


0.839 


1.339 


8 


0.705 


1.205 


0.884 


1.384 


16 


0.744 


1.244 


0.913 


1.413 


32 


0.766 


1.266 


0.928 


1.428 


00 


0.792 


1.292 


0.946 


1.446 



It is also possible to use cross direct summing to construct longer DC-free multi- 
user codes, but it has been found that in every case considered fewer codewords are 
generated. The overall rate is less (sometimes significantly) than when using self 
direct summing. With the scheme of Example 2 for instance, Ri 22 i=l -0 = R12, so no 
increase in overall rate occurs in this case. 



3.3 Direct Sum Codes with M>2 

The direct sum construction method can be applied to multi-user DC-free coding 
schemes with any number of users M. In this case, a 3-user scheme exists with n=2, 
qi=3, q2=2, and qj=3. The component codes are Cy={00, -11), C2={1-1,-11) and 
C3={-11, 10/-10, 11/-1-1}, where Cj has 3 effective words. The code rates are 
R]=0.315, R2=0.5 and Rj=Q.5, and the overall rate 7?/2 j= 1-315. After the first iteration 
of the self direct sum construction, Ri 12233= 1.6 16, and the asymptotic rate as n tends 
to infinity is 1.732. 
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4 Trellis Construction of Multi-user Line Coding Schemes 

It is relatively straight forward to design trellises (in general non-linear) for the 
component codes of a multi-user coding scheme. It is also convenient to use disparity 
values to label the trellis states. If the codewords of a component code are all 
balanced, then the trellis is that of a block code. If, on the other hand, a component 
code contains complementary disparity pairs of codewords, then it is equivalent to a 
code with memory, and has a convolutional-type code trellis. The trellis of the 
composite coding scheme can then be obtained by using the Shannon Product 
construction [8,16]. Once this composite trellis is obtained, then the soft or hard 
decision Viterbi Algorithm (VA) decoding can be used to recover the information 
bits/symbol of each user [8]. Figures 4, 5 and 6 show the trellises for code Cj, code C 2 
and the composite multi-user coding scheme of Example 1, respectively. 




Fig. 4. Trellis for Code C, 



"j/ 



• 0 



• 2 



Fig. 5. Trellis for Code 




Fig. 6. Combined trellis for Example 1 



Figure 7 shows the trellises for the first iteration of the self direct sum construction for 
the codes of Example 2 with block length «=4. 
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Fig. 7. Trellises for the codes in Example 2 (Direct Sum) 

Various versions of the code trellises and therefore of the composite trellis, can be 
constructed, which vary in their spectral properties. In the case of Example 3, there is 
no penalty in choosing the trellis with the best low frequency roll-off, as the decoding 
performance is almost identical for all the trellis versions investigated. The trellis 
structures shown in Figures 8, 9 and 10 differ in the number of path emerging and 
converging to the nodes at a given depth. The nodes are labelled with disparity values 
at it was prove that if these value ranges are closer to zero then better low frequency 
roll-off can be attained. 




DISPARITY 

0 

+1 



+2 



+3 



Fig. 8. Trellis Structure 1 of Code ={00, 01/0-1, 10/-10, 1 1/-1-1 }, the bold paths represent 
parallel paths 




DISPARITY 

-1 

0 

+1 

+2 



Fig. 9. Trellis Structure 2: because the Disparity range is closer to Zero, the Code has less 
Magnitude in its Power Spectrum at Low frequencies (nearer DC) 
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DISPARITY 

-I 

0 

+1 

+2 



Fig. 10. Trellis Structure 3: The Structure above has 4 branches from each node, therefore 
making it homogeneous. 



From a complexity viewpoint, it is possible to set a limit on the maximum number 
of nodes required for a given DC-free Code. If the maximum disparity value is > 
then the minimum number of nodes per depth required in a trellis representing such a 
code is N(=2 djj^^x.. For this to occur, it must he possible to have a branch with 
disparity either +dj^^x or -dj^^x from every node. This in itself sets the maximum 
number of nodes required in order to represent a code by a trellis. The following 
Example in Figure 1 1 illustrates this 



DISPARITY 
-I 
0 
+I 

+2 




Fig. 11. Example for Code=(01/0-l,l 1/-1-1 } with djjj^x=2, then Nj= 4 Nodes. 

Finally, a note on the performance of these schemes under noisy conditions. From 
the results of the simulations -not shown here but will be presented in the talk- that 
the redundancy inherent in the coding schemes enables advantage to be taken of the 
use of soft-decision decoding, even though the redundancy has not been structured to 
provide error control decoding power (the free distance of the trellis for C 2 in example 
1 is only 2, for example). 



5. Conclusions 

In this paper, multi-user DC-free coding schemes for the MAAC have been presented 
and described for M=2,3 users. It was shown that by using the direct sum construction 
on short multi-user codes, it is possible to devise longer DC-free multi-user coding 




174 



P. Benachour, P.G. Farrell, and B. Flonary 



schemes with rate sums, which increase quite rapidly at each iteration of the 
construction. It was also shown that asymptotically there is no penalty in requiring the 
coding schemes to be DC -free. In addition, the schemes can be efficiently soft 
decision decoded using a relatively low complexity sectionalised trellis. The 
redundancy in the coding schemes enhances their performance, especially in the soft- 
decision case. Additional structured redundancy designed to provide specific error- 
control power can be achieved by finding other methods of combining codes with a 
greater distance. The direct sum method appears to be very effective for the 
construction of both DC-free and non-DC-free multi-user coding schemes with a gain 
for DC-free codes. 
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Abstract. The multi-functional approach to design has been previously 
defined. This paper emphasizes the need for such approach in order to re- 
duce the increasing level of complexity of future communication systems. 
The paper then describes a class of deterministic quasi- analogue (multi- 
level) pseudorandom sequences with good correlation properties and the 
means for tailoring the sequences’ probability density functions. Finally, 
the family of sequences is shown to offer a vehicle for coalescing a number 
of aspects of multi-functionality within the context of spread-spectrum 
communication systems. 



1 Introduction 

The nature of communication systems is evolving towards a higher degree of mo- 
bility and availability of service. With this drive, future communication systems 
will be required to offer greater flexibility, leading to an increase in complexity. 

Previously, the multi-functional approach to design has been proposed as 
the means by which algorithmic complexity may be reduced, with no sacrifice 
in the overall system functionality, adaptation and performance [1]. Due to the 
multiplicity of functions that are expected from modern communication systems, 
multi-functionality has been recognised as an ideal that cannot be fully met in 
practice. 

This paper shows that several aspects of multi-functionality may be combined 
into one algorithm. In particular, the various functions of the physical-layer of 
a spread-spectrum (SS) communication system are coalesced into one algorithm 
using a class of quasi-analogue, or multi-level, pseudorandom sequences, known 
as trajectory-derived (TD) sequences. 

In section (2), the need for system flexibility and adaptivity is emphasized and 
the system parameters that can be made adaptable are identified. The concept of 
multi-functionality is then formally defined in section (3) and the various aspects 
of the concept are outlined. 
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The synthesis procedure for TD-sequences is described in section (4) and ex- 
ample sequences are given. The dependency of the sequences upon initial condi- 
tions is also discused and demonstrated. Next, various conditions are stated that 
guarantee algorithmic determinism. A close form expression for the sequences’ 
probability density function (PDF) is derived and compared against estimates 
of the PDF of the generated example sequences. It is also shown that the algo- 
rithm offers a simple mechanism for tailoring the PDF of the resultant sequences; 
two designs that yield sequences with Gaussian and uniform PDF s are specified 
and demonstrated. Formal definitions of the periodic/aperiodic real/complex se- 
quence obtainable from TD-sequences are then given. A study of the sequences 
correlation properties is also made. In particular, it is shown that the sequences 
possess quasi-impulsive aperiodic autocorrelation functions (ACF), with the ape- 
riodic crosscorrelation functions (CCF) between different sequences being low. 

The application of the TD family of sequences to multi-functional commu- 
nication systems is treated in section (5). The various aspects of an M-ary 
phase-shift-keying (MPSK) spread-spectrum communication system, which are 
deliverable by TD-sequences, are explained; a number of simulation results are 
presented to verify the multi-functional system concept. 

Finally, concluding remarks are given in section (6). 

Notation: The following symbols and notation are used: Z denotes the 
set of all integers; Tl. denotes the set of all real numbers; C denotes the set of all 
complex numbers; TLe x denotes the real part of x\ Im x denotes the imaginary 
part of X] the symbol * denotes complex conjugates; G denotes convolution; 
< p,q > denotes the p modulo-g; the symbol V means for all. 



2 System Adaptation 

Current trends show that demands on mobility and availability of service from 
communication systems will continue to rise. In order to sustain communica- 
tion services under variable channel conditions, system flexibility has become a 
primary concern for system designers. System flexibility is greatly enhanced if 
adaptivity becomes the guiding theme during the process of system design. The 
main system parameters that can be made adaptable are [2] : 

— Occupied Bandwith; 

~ Transmission Rate; 

— Transmitted Signal Format; 

— Radiated Power; 

— Operating Frequency; 

— Control Protocol; 

— Spacial Selectivity. 
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3 Multi-functionality 

The above list of adaptable parameters indicates that the level of sophistication 
of future communication systems will undoubtedly continue to increase, thus 
leading to systems with higher levels of complexity. Whilst computational power 
continues to increase by technology development, there remains a basic need to 
simplify system designs without sacrificing functionality and/or performance. 
The concept of multi-functionality has been defined as the means [1] 

by which the various elements of channel encoding / decoding applicable 
to adaptive communication systems could be coalesced in a systematic 
manner to achieve a more powerful and elegent algorithmic structure, 
together with a more efficient implementation within a digital signal pro- 
cessing architecture. 

From the definition, it is clear that multi-functionality is primarily concerned 
with the physical layer aspects of the communication process, which include : 

— Data-Encoding/Decoding; 

— Modulation/Demodulation; 

— Error-Control; 

— Synchronisation; 

— Security; 

— Multiple/Random Access. 

It may be said that a full multi-functional solution to the communication 
problem in its entirity is an ideal that is not likely to be fully achieved. This 
can be attributed to the diversity and complexity of the tasks that must be 
undertaken by the physical layer. However, a system designer must strive to seek 
the reduction of system complexity by focusing on the techniques that achieve 
one particular aspect of communications, whilst simultaneously offering at least 
one other ‘free’ aspect of the system functionality. 

4 Trajectory-Derived Sequences 

Recently, there has been an interest in analogue-type pseudorandom sequences 
for application in digital data transmission systems [3]. In this section, TD- 
sequences are presented as an example of such sequences which offer sufficient 
degrees of randomness and versatility for use in multi-functional digital commu- 
nications. 



4.1 Sequence Synthesis 

The sequences are generated via geometric ray-tracing. Here, a recti-linear path 
is plotted within a pre-defined, and perfectly-reflecting, enclosure. The trajectory 
can be visualised as the path which would be taken by, say, a small ball ‘bouncing’ 
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off the inner walls of the enclosure, with no energy loss. An example configuration 
is taken to be a square with side 2 units, centred on the origin of the Cartesian 
co-ordinate system and rotated by 45 deg. A circular reflector centred on the 
origin is also positioned inside the enclosure. The x and y coordinates of the 
successive points of impact define either a complex sequence or, alternatively, 
two real sequences. To indicate the generation method, these sequences have 
been termed trajectory-derived sequences. 




-1 - 0.5 0 0.5 1 

x-Component 



Fig. 1. TD-Sequence Synthesis and Sensitivity to Initial Conditions. 



An example trajectory, T\, was generated using the algorithm described 
above; figure 1 shows the enclosure and the first 13 reflections of the trajec- 
tory. The initial point was taken at ^(1,-1) with an initial angle of 47 deg. 
The trajectory was generated with a computational accuracy of 6 decimal places. 
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(a) x-Component. 
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- 1.5 ' — ^ ^ ^ ^ ^ 
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(a) y-Component. 



Fig. 2. The x and y Components of Trajectory Ti. 
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4.2 Sensitivity to Initial Conditions 

A second trajectory, T 2 , was also generated using the same enclosure configura- 
tion but with a small change in the starting x-coordinate value and the initial 
angle. The exact value of the starting point was ^(—1.01, —1) and the initial an- 
gle was chosen to be 47.01 deg. Figure 1 also shows the series of reflection points 
resulting from the second trajectory. The plot shows the two trajectories start 
from almost the same point and with the same initial take off angle. However, at 
the 5*^ reflection point, the two trajectories begin to diverge and follow indepen- 
dent paths and by the 7*^ reflection, the trajectories display distinct behaviours. 
Thus, it is demonstrated that past points of reflection and the respective take-off 
angles have a major influence on the sequence characteristics. In order to ensure 
that the algorithm is deterministic, some form of precision control is required. 
A simple computational rounding procedure of decimal places suffices to ensure 
algorithmic determinism. 



4.3 Parameters for Sequence Specification and Determinism 

The structure of TD-sequences is completely determined by the following factors: 

— Starting point and direction of initial path; 

— Geometric enclosure configuration; 

— Arithmetic precision of trajectory computation; 

— Number of successive impacts; 

— Choice of x, y or complex sequences. 

If all these factors are specified, then the sequence produced is deterministic and 
can be exactly regenerated. 



4.4 Derivation of the Sequence PDFs 



The PDF analysis of TD-sequences can be geometrically derived by considering 
the average projection of the reflection points on to the x and y axes. Assuming 
that the trajectories are uniformly distributed within the enclosure, the proba- 
bility of the reflection points being in any bin may then be expressed in terms 
of the total boundary and enclosure length existing in that bin. 

Consider the set of trajectories falling on the four inner enclosure walls. Let 
Pe{x, Ax) denote the probability of the reflection points falling on the proportion 
of the boundary in the bin centred at x with width Ax. Then 



Pe{x, Ax) 



2[boundary length in the (x, Z\x)*^ bin] 



(x -I- 



Ax 

2 



total enclosure length 



2^2 



( 1 ) 



(2) 
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In a similar manner, if Pc{x, Ax) denotes the probability of the reflection points 
falling on the proportion of the circular reflector in the bin centred at x with 
width Ax, then 



Pc{x, Ax) = 2 



arc length in the (x, Ax)*^ bin 
cicumference of the reflector 



(3) 



Y 




Fig. 3. Sequence PDF Computations. 



Consider the reflector shown in figure 3: the arc length projected by the angle 
9 (in radians) is given by arc length = r x 9. Using triangulation, the angle 9, 
created by the points (xo,0) and (a;i,0), may be shown to be 



9{xq, x\) = a — (3 



( V(P-Pn) 

\ r 



) — sin ^ ( 
) — sin“^( 



ViP-xl) 

r 



) I Xo |<| Xi I 
) I Xo |>| Xi I 



(4) 



Letting Xo = x — Ax/ 2 and X\= x + Ax/ 2 in equation 4 expresses the angle, 9, 
in terms of the {x, AxY^ bin and substituting the results into equation 3 yields 



Pc{x, Ax) 



1 

7T 

1 

7T 






) I X |> 0 
) I a: |< 0 



(5) 



If they exist, the probability density functions Pe{x) and Pc{x) are related to the 
obtained probability functions for Pe{x,Ax) and Pc{x,Ax) by [4] 



px-\-Ax j2 


1 




/ Pe{x')dx' 

' x—Axf2 


- Pe{x, Ax) Pe{x) - 2^2 


(6) 


px-\-Axf2 


1 




/ pc{x')dx' 


- Pc{x, Ax) ^ Pc{x) - ^ 


(7) 


' x—Axj2 


(8) 
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(a) Theoretical. 
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(b) PDF of ^-Component of Ti. 



Fig. 4. The PDF. 



If p denotes the probability of the trajectory landing on the reflector, then 
the sequence density function is 



p{x) 



p 




2^2 



1-P 

2^2 



I a; |> 0 
Otherwise. 



(9) 



A plot of the theoretical PDF is shown in figure 4(a) for r = 0.5 and p = 0.5, 
whilst estimate of the PDF of the x component is given in figure 4(b). 



4.5 The Tailoring of Sequence PDFs 

The use of multiple reflectors and appropriate enclosure scaling constitutes a 
tool for tailoring the sequence PDF. In particular, the shape of the PDF may be 
controlled by the location, number and sizes of the internal circular reflectors, 
whilst the enclosure size determines the maximum and minimum values of the 
sequence. Since, each reflector offers a PDF shaped by equation 9, then equation 
9 constitutes the building block for any desired PDF. 

Figure 5(a) describes the enclosure design that may be used to synthesise 
a random sequence whose PDF is approximately Gaussian, whilst figure 5(b) 
gives the design that will yield pseudo-random sequences whose PDF are ap- 
proximately uniform. 

For the Gaussian case, a trajectory was generated using the arrangement of 
flgure 5(a). The trajectory is specifled by the starting point (i/2,0.0), a take-off 
angle of 20 deg and a computational accuracy of 6. The PDF of the initial 10, 000 
points of reflection of the resultant x component were computed and is shown 
in flgure 5(c). The mean and variance (the square of the standard-deviation) 
of the TD Gaussian sequence were estimated as 0.0 and 0.45, respectively. The 
theoretical PDF corresponding to a zero mean Gaussian distributed random 
variable with variance equal 0.45 is also plotted in flgure5(c). It is found that the 
distribution of the generated sequence reasonably approximates the theoretical 
distribution. 
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(a) Gaussian Enclosure. 
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(b) Pseudo-Random Enclosure. 




- 1.5 -1 - 0.5 0 0.5 1 1.5 

x-Component 



(c) PDF of Gaussian Sequence. (d) PDF of Pseudo-Random Sequence. 



Fig. 5. Generation of Sequences with Different PDFs. 



Similarly, to synthesise a pseudo-random sequence, another trajectory was 
generated using the enclosure of figure 5(b) and the PDF of the initial 10, 000 
points of reflection of the resultant x component were computed and are shown in 
figure 5(d). The trajectory is specified by the starting point (-^2, 0.0), a take-off 
angle of 20 deg and a computational accuracy of 6. The obtained PDF indicates 
that the different sequence levels are approximately equi-probable. 

4.6 Real and Complex Sequences 

Given a series of consecutive ordered pairs, T = {(aJi,?/i)}, describing a unique 
trajectory, then a pair of real sequences, si, S 2 € TZ, may be defined by 

Si = 7^e T (10) 

S2=XmT ( 11 ) 

Similarly, one complex sequence, s € C, may be defined by 



s = T 



( 12 ) 
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4.7 Periodic and Aperiodic Sequences 

For a periodic type of sequence, the selected elements are chosen in a cyclic 
manner. Hence, the resulting sequence is 

s = {{x^,yi) -.y <i + D,N >} (13) 

where N is the required sequence period and D is the initial phase of the se- 
quences. Real periodic sequences can be defined in a similar manner. For an 
aperiodic type of sequence, the period of the sequence is made infinite by letting 
N — >■ oo. 
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Fig. 6. Correlation of Aperiodic TD-Sequences. 



4.8 Correlation Analysis 

The CCF between two complex functions fi and /2 is defined by [5] 



(Pi2{t) = / fl{t)f2{t + T)dt 



(14) 
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Thus, ACF of a given function is obtained by computing ^h(t). 

A 1024-reflection trajectory, Ti, was generated and the aperiodic ACFs of the 
X and y components were computed; the trajectory is specified by the starting 
point (—1, 0), a take-off angle of 47 deg and a computational accuracy of 6. The 
X and y components of the first 1024 trajectory reflections were employed as 
aperiodic sequences. Figures 6(a) and 6(b) shows the obtained results which 
indicate that the sequences’ aperiodic ACF s are of the quasi-impulsive type. In 
addition, the aperiodic CCF between the two considered sequences was computed 
and is shown in figure 6(c). Note that the amplitude of the CCF is of the same 
order as the sidelobes of the sequences’ ACFs. 

Similar low ACF and CCF sidelobes are obtainable for the periodic and 
complex sequences. 



5 Multi-functional System Design 

In the previous section, the synthesis and properties of TD-sequences was de- 
scribed. The availability of many deterministic sequences, combined with their 
correlation and PDF properties, suggests that the sequences offer a simple mech- 
anism for achieving multi-functionality within the context of SS communications. 



5.1 Aspects of Multi-functionality 

The applicability of TD-sequences to achieve different aspects of multi-functional 
communication systems will now be addressed. 



Data-Encoding: Given that the sequence ACFs are of the quasi-impulsive 

type, the sequences offer the ability to encode data using the method of se- 
quence inversion keying with respect tothe random data, resulting in binary 
communications. Extension to M-ary signalling is also possible [6]. 
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Fig. 7. Performance of TD-Sequence Based MPSK DS-SS Communication System. 
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A family of M-ary phase-shift-keying (MPSK) SS communication systems 
may be defined which makes use of TD-sequences as spreading codes [7]. In 
such systems, TD-sequences are encoded by two independent streams of random 
binary data yielding the baseband version of the information bearing signal. 
The two waveforms are then applied to a quadrature modulator for frequency 
up-conversion purposes. At the receiver, the received signal is first quadrature 
demodulated and the data is then recovered by the application of two matched 
filters which are matched to replicas of the TD-sequences used at the transmitter. 

A binary phase-shift-keying (BPSK) and quaternary phase-shift-keying 
(QPSK) members of the family of MPSK systems described above were simu- 
lated over an additive-white Gaussian noise channel. The bit-error-rates (BER) 
for various values of the ratio of energy-per-bit to noise power spectral density, 
denoted by Et/No, are plotted in figure 7 for the simulated systems, together 
with the theoretical BER for a BPSK modem given in [6]. The results clearly 
show that the use of TD-sequences as spreading codes causes no increase in the 
system BER. 



Spectral- Spreading: The process of sequence inversion keying (or bi- 

orthogonal encoding in general) results in the application of many sequence 
digits per data symbol. Thus, the transmitted signal is made to occupy a greater 
bandwidth than the data. Bandwidth spreading equips the system with the abil- 
ity to mitigate the effects of multi-path channels [7] . 



Channel Estimation: Given that the sequence AGFs are of the quasi- 

impulsive type, channel estimation becomes possible. However, the measured 
responses will be somewhat inaccurate due to the imperfection of the AGE. 
Inverse-filtering methods may be used to account for such imperfections [8] . 

Let s{t) and r{t) denote the test signal applied to the channel input and the 
corresponding channel output, respectively. Then, the channel input and output 
are related by the convolution integral 

r{t) = G/i(t)s(t) (15) 

where h{t) is the channel impulse response. Estimates of h{t) may be obtained 
by convolving the channel output with the reciprocal of s(t), which is denoted 
by s{t). The signal s{t) and its reciprocal s{t) have the property that their cross- 
convolution is a perfect Dirac impulse, ie 

Gss = S{t) (16) 

This measurement method of the impulse response has been termed generalised 
channel identification since it constitutes a generalisation of the classical ap- 
proach in which the estimate of the channel impulse response is obtained by 
cross-correlating the channel output with a replica of the applied test signal [9] . 
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The correlation-based classical method suffers from a deterministic error due 
to the non-zero sidelobes of the test signal ACF ; this error is exactly given by 

e(r) = J h{u)g{T — u)du (17) 

where ^(t) is the sidelobe of the test signal ACF at delay r. When employed, 
the generalised approach completely eliminates this error [8]. 
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Fig. 8. Four-Path Radio Channel Estimation Using TD-Sequences {SNR = lOdB). 



Figures 8(a) and 8(b) describes the inphase (I) and quadrature (Q) compo- 
nents of an artificial four-path radio channel, whilst the obtained channel esti- 
mates are given in figures 8(c) and 8(d), respectively. The results show that TD- 
sequences may be used effectively for the purpose of channel estimation within 
the context of generalised channel identification. Note that the distortion in the 
measured responses is solely due to the additive white Gaussian noise present in 
the observed channel output; the noise was added at an signal-to-noise-ratio of 
20dB. 

Synchronisation: Again, using the sequences ACF property means that ‘free’ 
time-synchronisation becomes available at the receiver. Combined with modu- 
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lation derived synchronisation, robust symbol timing recovery may be obtained 

[ 10 ]. ^ 

Figure 9 shows the initial period of the synchroniser output and the symbol 
synchronisation epoch flag. The synchroniser output consists of a rising series of 
impulses whose time positions correspond to symbol boundaries in the received 
signal. The magnitude of the impulses saturates when the synchroniser reaches 
the steady state part of its response, when the receiver is considered to be fully 
time-aligned with the received signal. 

The diagram also shows that the synchronisation epoch flag changes state 
in an irregular fashion when the receiver is un-synchronised. However, once a 
degree of time synchronistion is achieved, the flag changes state at a regular 
time intervals, corresponding to the symbol period. 
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Fig. 9. Timing Recovery in the MPSK SS Commnnication System. 



Multi-user. The low CCF between any pair of TD-sequences enables their use 
in a multi-user scenario, where a number of users can simultaneously access the 
same channel, either in a synchronous or asynchronous manner [1 1] . 



Security. The bandwidth spreading feature of the sequences, combined with 
their sensitivity to initial conditions, implies that communication systems em- 
ploying TD-sequences as spreading codes have the following three naturally em- 
bedded security aspects. 

— Low- Probability of Intercept (LPI): As a consequence of spectral-spreading, 
the power spectral density of the transmitted signal is made to extend over 
a greater bandwidth than that of the data. Thus, the signal becomes hidden 
in the background noise, making it difficult for an interceptor to determine 
the presence/ absence of transmission. In addition, the ability to tailor the 
sequences’ PDF contributes to the LPI system feature. 

— Anti- Jamming: Spectral-spreading also leads to an increased system anti- 
jamming capability, if the signal presence is detected by an eavsdropper. 
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— Privacy: The dependency of TD-sequences on the initial conditions (enclo- 
sure configuration, starting points and initial angle) leads to some form of 
system privacy, since any small error in the specification of the initial condi- 
tions yields an entirely different trajectory. It has been established that the 
CCF between different TD-sequences is low; thus, it follows that the trans- 
mitted message from a communication system employing TD-sequences, can 
only be successfully recovered by the receiver(s) to whom the trajectory’s 
initial conditions are exactly known; this knowledge acts as a form of a ‘key’ 
without which the transmitted information cannot be optimally de-spread. 

6 Concluding Remarks 

In this paper, the concept of multi-functionality was defined and identified as 
a means which leads to potential complexity reduction in future communica- 
tion systems. The TD method of generating a class of quasi-analogue pseudo- 
random sequences was then described and shown to offer the desired properties 
for achieving multi-functionality within a SS communication system. In partic- 
ular, the family of TD-sequences offers : 

— many of sequences; 

— quasi-impulsive aperiodic ACT; 

— low aperiodic CCF; 

~ determinism if, and only if, the initial conditions are accurately known; 

— a mechanisim for PDF tailoring. 

These properties were exploited in the definition a multi-functional MPSK 
SS communication system in which TD-sequences facilitated data-encoding, 
spectral-spreading, channel estimation, synchronisation, multi-user capability 
and security. Simulation results have been presented to clarify and verify the 
concepts. 

In essence, this paper has demonstrated that a multi-functional algorithm 
may be developed which fullfils many of the required tasks of the physical layer 
of a SS communication system. 
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Abstract. In this paper the problem for the improvement of the Del- 
sarte bound for r-designs in finite polynomial metric spaces is investi- 
gated. First we distinguish the two cases of the Hamming and Johnson 
Q- polynomial metric spaces and give exact intervals, when the Delsarte 
bound is possible to be improved. Secondly, we derive new bounds for 
these cases. Analytical forms of the extremal polynomials of degree r-|-2 
for non-antipodal PMS and of degree r -I- 3 for antipodal PMS are given. 
The new bound is investigated in the following asymptotical process: 
in Hamming space when r and n grow simultaneously to infinity in a 
proportional manner and in Johnson space when r, w and n grow simul- 
taneously to inhnity in a proportional manner. In both cases, the new 
bound has better asymptotical behavior then the Delsarte bound. 



1 Introduction 

Let At be a polynomial metric space (PMS) [14] with metric d{x,y), standard 
substitution t = a{d{x,y)) and a normalized measure p,, p{M) = 1. When we 
describe the PMS we follow Levenshtein [14]. Any finite nonempty subset C of 
M is called a code. A code for which a{d{x,y)) < a{d) {x,y € C) and d is the 
minimal distance of C is an (At, |C|, (r)-code. We consider some parameters of 
the codes connected with their metric properties. For any C C AJ, let with A(C) 
denote the distance set of C, i.e set of values of d{x,y) when x,y G C. For any 
code C the minimal distance of C is defined to be d(C) = min d(x,y) and 

the parameter s(C') = |Z\(C')\{0}[ characterizes the number of different distances 
between distinct points of C. Obviously for any code C, d{C) = min(Z\(C')\{0}). 
The diameter of the whole space A4 can be defined as D{M) = max(Z\(A4)\{0}). 
Correspondingly the diameter of the code C is D{C) = max(Z\(C') \ {0}). 

* The author was partially supported by NATO research fellowship and Concerted 
Research Action GOA-MEFISTO-666 of the Flemish Government 
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Now let us introduce the concept of a r-design. For our purposes it is useful 
to consider the code C as a weighted set C = {C,m), where m is a certain 
positive-valued function on C. We suppose that the weights m{x) of elements x 
are normalized. Hereafter we consider a code C as a special case of a weighted 
set, when m{x) = 1 for all x G C. We will define a (weighted) r-design by means 
of the strictly monotone real function (substitution) a{d) defined on the interval 
[0,D{M)]. 

Definition 1. A weighted set {C,m) will be referred to as a weighted r-design 
in AA with respect to the substitution a{d) if for any polynomial f{f) in a real t 
of degree at most r, 





f{a{d{x,y)))dfj,{x)dp.{y) 



1 



f{a{d{x,y)))m{x)m{y). 

x,yGC 



The r-design is a special case of the weighted r-design, when m{x) = 1 for 
all X G C. The maximum integer r (r < s(Af)) such that a (weighted) set C is 
a (weighted) r-design is called the strength of C and denoted by r(C). Suppose 
AA is finite and A{C) = {dojdi, . . . ,dn}, is the distance distribution of C. The 
dual distance distribution (so called MacWilliams transform [16, p.l37]) of C is 
defined to be A' {C) = {dp, . . . , djj}. The dual distance of the code C is the 
smallest i, i = 1, . . . ,n, such that d' yf 0; the dual degree s'{C) of C is the 
number of i, i = 1, . . . , n, such that d' 0. As proved in [3] r(C) -k 1 = d'{C). 

The basic problem of the coding theory is the construction of the maximum 
(on cardinality) cr-code. Together with this problem there exists another one of 
constructing the minimum (on cardinality) r-design (or equivalently a code with 
dual distance d' = r-k 1). As it is proved in [12,15,1] this two problems are dual. 

The PMS AA = {AA , d{x , y) , p,) with a given substitution a{d) is connected 
with a system of orthogonal (to the measure jz(t)) polynomials {Ui{t)} of degree 
i, i = 0,1, ■ ■ ■ , s{AA), the so called zonal spherical functions (ZSF). The function 
v{f) is equal to 1 — p{(j~^{t)) on the interval [—1,1]. So the system Ui{t) is 
defined by the substitution cr(d) and measure p{d). We can assume without loss 
of generality that a{d) is a continuous strictly decreasing function on [0, D{AA)\ 
such that a{D{AA)) = — 1 < a{d) < cr(0) = 1. 

A polynomial metric space AA is called antipodal if for every point x G AA 
there exists a point x G AA such that for any point y G AA we have a{d{x, y)) -k 
f^{d{x,y)) = 0. 

Since Ui (t) is a system of orthogonal polynomials there exists a unique system 
of positive constants ri, i = 0, . . . , s{AA) such that 

J Ui{t)Uj{f)dv{f) = [/i(l) = 1, i = 0, 1, . . . , s(Ad). (1) 

The integral on the left-hand side can be considered as a Lebesgue-Stieltjes 
integral on [—1, 1]. When AA is finite v{f) is left continuous and has s(Al)-kl steps 
at the point U = a{di) with positive step sizes Wi, z = 0, 1, . . . , s(Al), ~ 

1. Denote w{t) = A ft) if AA is infinite. Since the measure is normalized we have 
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Uo{t) = 1, ro = 1. The orthogonality condition (1) for a finite Ai can be rewritten 
respectively in the following form: Ui{a{dk))U j{a{dk))wk = Sij for 

ij = 0, 1, . . . ,s(7W). 

For arbitrary a,b £ {0, 1} we define the so called adjacent to {Uk{t)} system 
of polynomials {U^’^{t)} in a real t of degree k,k = 0,1, ■ ■ ■ , s{M) — <5a,i — (Jbp 
as follows. 

First we define positive constants c“’^ and the function In the infinite 

case d{v°‘’’^{t)) = c“’^(l — t)“(l — t)^w{t)dt. For the finite case is a step- 

function and it is equal to — cr((ife))“(l-|-cr((ife))^Wfc in the points tk = <j{dk) 
{k = 0,1,... ,s(AI)). The constants c“’^ are chosen in such a way that the 
Lebesgue-Stieltjes measure on [—1,1] generated by the function is 

normalized, i.e. diy°’’^(t) = — t)“(l -I- t)^dn{t) = 1 in the infinite 

case. Correspondingly for the finite case we have 

s(M) 

c-,b ^ (1 _ a{dk)T{l + a{dk)fwk = 1. (2) 

fc =0 

Then the polynomials U^’^{t) together with the positive constants are de- 
fined uniquely by the following orthogonality relations 



s(M) 

^a,b^a,b ^ [/“’'’(cr(dfc))t/“’''(cr(dfc))(l ~ Cr(dfc))“(l -f a{dk)fwk = Sij. (3) 

k^O 

It is easy to see that = 1, C/^’**(l) = 1, Tq’*' = 1. When a = 6 = 0 we 

omit the upper indices. Denote by —1 < < 1, i = 1, . . . , fc, the roots of 

the polynomial C/^’**(t), fc > 0, ordered in increasing order and Note 

that by the normalization = 1 the leading coefficient of polynomials 

Uk'^{t) is positive and sgnU^'’^{—l) = (—1)^ for fc > 0. Let us introduce the 
notation 

c'Fw-E-tF- 

i=0 



For any integer fc, and reals x and y we have the well-known Christoffel-Darboux 
formulae 



k 

y^jiUi{x)Ut{y) 

i=0 



Uk+i{x)Uk{y) - Uk{x)Uk+i{y) ., , 

rkvik Ax y^y, 

x-y 

rkmk{U'k+-^{x)Uk{x) - U'k{x)Uk+i{x)) \ix = y. 
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It is known that for the ZSF the following recurrence formula holds: 

{t + nii + Ci- l)Ui{t) = miUi+i{t) + aUi-i{t), (4) 

for i > 0, where r_i = m_i = 0, , Ci = and C/_i(t) = 0, 

Uo{t) = 1. By definition 

n^\x,y) = J2rT'^Ut\x)Ut\y), (5) 

i=0 

For any non-negative integer k [10,14] the following relations between the adja- 
cent system of orthogonal polynomials hold: = 

We consider the Linear Programming Theorem due to Delsarte [3] . 

Theorem 1. Let C C M be an (Al, jCj, crj-code (reps, r-design) and let f{t) 
be a real non-zero polynomial such that 

(Al) f(t) < 0, for —l<t<a, 

(resp. (Bl) f{t) > 0, for —l<t<l), 

(A2) the coeffieients in the ZSF expansion fit) = 

satisfy fo > 0, fi > 0 for i = I, . . . ,k. ^ 

(resp. (B2) the coefficients in the ZSF expansion f{f) = 
satisfy fo > 0, fi < 0 for i = t 1, . . . ,k.) 

Then, \C\ < n{f) = /(l)//o (resp. \C\ > f2{f)). 

We denote by (resp. Bm.t) the set of real polynomials which satisfy the 

conditions (Al) and (A2) (resp. (Bl) and (B2)). We will consider the quan- 
tities Au{M.,a) = min{l7(/) : fit) G A^^cr} and B[/(A4,r) = max{12(/) : 
fit) G 

The universal upper (resp. lower) bounds L(Af,cr) (resp. Z1(A1 ,t)) for the 
cardinality of an (Al, jCj, cr)-code (resp. a r-design) can be presented in the 
following form [10,5]: 



rrl.O / \ k—l+e 

\C\<L(M,a) = {l- ‘-y'A ^ 

er ('^) i.o 

where e = 0 if < cr < and e = 1 if < cr < t]f^ , resp. 

|C| >ll(Al,T) = 2^c°•^^r°’^ 

i=0 



(6) 



( 7 ) 



where 0 G {0, 1} and t = 21 -\- 6. 
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The bound (6) was obtained by Levenshtein [9] using the polynomial 

of degree h{a) = 2k — 1 + e{a). The polynomial has been discovered by 

Levenshtein [9]; the optimality proof has been given by Sidel’nikov [22]. Later 
new proof was proposed by Levenshtein [10]. 

The problem of finding lower bounds on the minimum possible size of designs 
in PMS was considered by many authors. Delsarte in his seminal paper [3] derived 
general method for obtaining the bound and found it in the case t = 21 for finite 
PMS. Dunkl [6] obtained lower bounds when t = 21 — \ for finite PMS, Rao 
[19] for the Hamming space, and Ray-Chaudhuri/ Wilson [20] for the Johnson 
space. For infinite PMS, they were proved by Delsarte, Goethals and Seidel [5] 
for the Euclidean sphere and by Hoggar [8] for the projective spaces. All this 
bounds become classical lower bounds in different PMS. A classical result by 
Schoeneberg and Szego [23] shows that the polynomial 

/W(t) = (t + i)«((g/(t)f 

of degree r is optimal. The bound (6) is called Levenshtein bound. The bound 
(7) we will call Delsarte bound for r-designs and it will be the main object 
of our investigations. Both of these bounds were derived by using the Linear 
Programming Theorem. 

Definition 2. A polynomial f{t) € Bm,t is called BM,T-extremal if 

f2{f) = max{l7(5) : g{f) G Bm,t, deg( 5 ) < deg(/)}. 

A polynomial f{f) G Am,ct is called A extremal if 

^(/) = min{f?(g) : g{t) G Am, a, deg{g) < deg(/)}. 

It is known that and are Am,o- and R^^^r-extremal of degree h{a) 

and T, respectively. 

PMS are finite metric spaces represented by P- and Q- polynomial association 
schemes as well as infinite metric spaces. The most famous examples of the finite 
PMS are the Hamming, Johnson, Grassmann space. For finite PMS the system 
of orthogonal polynomials U is either Q or P, and we use U for the other one. 
A stronger version of Theorem 1 is valid for the finite spaces [3] . 

Theorem 2. Let C G Ai he an a -code (rep s.t - design) and let f(t) = 
X)i=o be a real non- zero polynomial such that 

(A3) /(I) > 0, /(cr(i)) < 0, /or i = d, . . . , D{M), 

(resp. (B3) /(I) > 0, /(cr(i)) > 0, for i = 1,2, .. . , D{M)), 

(A2) /o > 0, ft>0fori = l,... ,k. 

(resp. (B2) /o > 0, fi < 0 for i = t -\- 1, . . . ,k.) 

Then, \C\ < n{f) (resp. \C\ > f2{f)). 
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Definition 3. Let M he a PMS. We define the quantities 



B^{M,t) = max{L2{f) ■. f{t) satisfy (B3),(B2)} 

Alj{M,a) = min{f2{f) : f{t) satisfy (A3),(A2)}. 

Obviously Bfi{A4,T) > Bu{M,t) and < Au(M,a). 

As it is proved in [12,15,1] in finite PMS we have. 



Alj{M,a{d))B*^{M,d-l) = \M\. (8) 



So, optimal bounds for codes and designs in finite PMS can be obtained as a 
solution of either problems. 

As a consequence the following bounds for the maximum cardinality of a 
cr-code C and the minimum cardinality of r-design D is true: 



\C\<A*Q{M,a{d)) 
\D\>B^q{M,t) = 



\M\ 

B*p{M,d-l)' 

\M\ 

A*p{M,a{T + \))' 



(9) 



In this paper we focus on Hamming and Johnson spaces, presented by Q- 
polynomial association schemes. 

In [18] we improve the Delsarte bound in infinite PMS and we present ana- 
lytical form of the new bound for non-antipodal spaces. It turns out that these 
results are valid for all PMS, in particular for Hamming and Johnson spaces. 
In the section 2 we present these results without proofs, which can be found in 
[18]. In Sections 3 and 4 we apply the method from [18] in Hamming and Q- 
polynomial Johnson space, respectively. The asymptotic behavior of the bounds 
is presented as well. In this paper we will compare our results only with the 
Delsarte bound, nevertheless it is well known that the Delsarte bound is not the 
best possible (even asymptotically) in the finite PMS. 



2 Preliminary Results 

We consider the following linear functional, which we will call test functions 
where are defined in [18]. 

This linear functional maps the set of real polynomials to the set of real 
numbers. We have — 1 < Gr{M,Qj) < 1 and Gr{M, /) = /o for any polynomial 
f{t) of degree at most r. Also Gr{M,f) = f{l)/D{M,T) if f{f) vanishes at the 
zeros of f^'^\f). 

Now we will give necessary and sufficient conditions for improvement of the 
Delsarte bound. Later on we will investigate some properties of the test functions, 
which turn out to be very useful. 
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Theorem 3. The hound D{A4 ,t) can he improved hy a polynomial f{t) G Bm,t 
of degree at least r + 1, if and only if Gr{M,Qj) < 0 for some j > r + 1. 
Moreover, if Qj) < 0 for some j > t + 1, then D{Ai, r) can he improved 

hy a polynomial in Bj^ r of degree j. 

Lemma 1. Let Ai he antipodal. If t and j are odd, then Gr{Ai,Qj) = 0. 



Corollary 1. Let Ai he antipodal PMS. Then 



Gr{Ai, Qt+2) 



> 0, for T = 2k 
= 0, for T = 2k + 1. 



The investigations of the test functions for designs, Corollary 1) and the 
obtained necessary and sufficient conditions for improving the Delsarte bound 
show that the smallest possible degree of an improving polynomial is r + 2 for 
non-antipodal and t + 3 for antipodal PMS. 



Theorem 4. Let A4 he non-antipodal PMS. Then, any B^.r-extremal polyno- 
mial of degree t + 2 (t = 2k -\- 0) has the form 

r + 2) = (1 + ty-<^[q{t + !) + (!- mrjQl’lf%{t) + QU^^t)]^, 
where q, rj are suitable constants. 

Now we obtain the following analytical form of the new bound. 

Theorem 5. Let Ai he a non-antipodal PMS. Then 

B{Ai,T) > S(Ai,T; T + 2) = D(Ai,T - 2) + i?(r) = t + 2)), 



where R{t) is a suitable constant. 



Corollary 2. Let A4 be a non-antipodal PMS and let r he an integer. Then 
S{Ai, r; T + 2) > D{A4,t) if and only if Gt{A4, <5t+2) < 0. 



Theorem 6. Let Ai be antipodal PMS. Then, any Bm,t - extremal polynomial 
of degree r + 3 (t = 2k + 9) has the form 

f^^'Ht', T + 3) = (1 + t)^[q{t + 1) + (1 - f)][riiQ]:^^{t) + r] 2 Qk^{t) + 0\i+i{t)Y 
where q,r]i,r ]2 are suitable constants. 



Theorem 7. Let Ai be an antipodal PMS. Then we derive a new hound 
B{Ai, t) > S(Ai,r; r + 3) = D(A4,t - 3) + i? 2 (r) = G(f^^^(t; r + 3)), 



where R 2 {t) is certain constant 
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3 Hamming Space 

The Hamming space = -H" {n,v = 2, 3, ... ) consists of vectors x = 
(xi, . . . , Xn) where Xj G {0, 1, . . . , v — 1} with the distance d(x, y), which is equal 
to the number of different coordinates of the vectors x and y. The Hamming 
space is distance invariant with the measure Wk = (^) (u— 1)^, A: = 0, 1, . . . ,n. 

This space is self-dual (i.e. the systems Q and P coincide) polynomial graph [3] 
with respect to the linear standard substitution a{d) = 1 — ^ and we have 
s(Af) = n, D{M) = n, dk = k. For v = 2 the Hamming space is antipodal and 
for u > 2 is non-antipodal. The system of ZSF {Qk{t)}k^o be defined by 



Qk{t) = Qk{o{d)) = where Kl''" {z) = {v - if ^ 

is the Krawtchouk polynomial of degree k and the corresponding constants are 
Tk = Wkv'^ = ff){v — 1)^. Considering (4) one can find for this case ruk = 

2{v-l)(n-k) _ 2k 

vn ’ ^ vn 

The corresponding adjacent systems of polynomials and constants (2), (3), (5) 
as calculated by Levenshtein in [14] are 



= X 



0,1 



2 ’ 




1)* 



QVHd)) 



Kr^’fd) 



r 



0,1 

k 



for A: = 0, . . . , n — 1 



2(u-l)’ 



^1,0 



f-k'){v-if 



Ql'\<^(d)) 



Kr^’fd-1) 



for A: = 0, 



, n — 1 



^ nv^ 1,1 ^ (EU 

^ 4(n-l)(u-l)’ («-2)(i;_i)fc ’ 

T^n—2,v / 7 1 \ 

Ql’^{(j{d)) = — 7-^- — for A: = 0,...,n-2 

The designs in the Hamming space are called orthogonal arrays, commonly 
denoted by OA\{T,n,v). Their cardinality satisfy jCj = Xv'^ . 
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Since \M\ = we will present well known pairs of universal bounds, i.e. 
inequalities which are valid for all codes C C which follows from Theorem 

2 . 

The first pair is the Singleton bound [21] for a code C C 

< \C\ < (10) 



where any of the bounds is attained if and only ifd + r = n + l. 

The second pair of bounds is formed by Rao [19] and Hamming [7] bounds 
for a code C C i7”. 



D{H:,t)<\C\< 






( 11 ) 



Codes, which cardinality is equal to the left-hand side or the right-hand side 
of (11) are called tight designs and perfect codes, respectively. Notice that for 
T = 21 + 0 we have T>(i7”, r) = X)Lo (”7^) (^ “ !)*• 

The third pair universal bounds for any code C C is the Levenshtein 

bound [12]. 



+ 1 )) 



<\C\<L{H^,a{d)) 



(12) 



The first two pairs of bounds are obtained by means of combinatorial meth- 
ods, but all of them can be obtained using Theorem 1 or Theorem 2. In fact the 
Delsarte bound for r-designs in Hamming space coincide with the Rao bound. 

Now we give the exact intervals for the dimension n, when the test functions 
are negative. Further on we will omit the proofs, which are too technical. 



Theorem 8. Let M = 77”, v > 3. Then Gr{H^, Qt+ 2 ) < 0 for 



V — 2 

T + 2<n<T + (fc^ -I- k) ifr = 2k 

V — 1 

T + 2<n<T + k‘^ + k ifr = 2k + 1. 



Theorem 9. Let M = 77”, v = 2. Then Gr(77J, Qr+a) < 0 for 



T + 3<n< \ k^/2 + 3k + 2+ 1/2 a/ - h 4A;3 -k 4*2 + 247: -k 20] ifr 

T + 3 < n < T + {k + lY ifr 

Applying (8), (9) for our bound we arrive at the following theorem. 

Theorem 10. For a code C C 77" the following hounds are valid 



5(77”,r;r + e) < |G| < 



S{H^, d — l;d — I + e)’ 



2k 

2k +1. 



where e = 2 and e = 3 for antipodal and non-antipodal spaces, respectively. 
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We have investigated the following asymptotic behavior of the bound when r 
grows with n. 

k 

Theorem 11. Let v > 2 and t = 2k + 9. If lim — = S, where ^ > <5 > 0, then 

n— >oo rj ^ 

S'(iJ”,r;r + 2) ~ D{Hff ,T)const{v,5). 



The constant is bigger than 1 and it is equal to 

const{v, i5) = 1 + when t is odd and 



const{v, i5) = 1 + 



ij^(i;-2)(1-<5)^+i>^(1-(5)(1-(5+(5^)+2«(1-(5)5^+(53 

(t>-l)(l-<5)(i;2(i_5)+2i,5(l-5)+52) 



when T is even. 



Note that if i5 ^ 0 when r is even, then const ^ v, but when r is odd, then 
const ~ 1 + which increases, when 6 decreases. 

Let us consider Rao-Hamming bounds (11). Till now we proposed a way to 
improve the Delsarte (Rao) bound. In fact we can use the same test functions to 
check when it is possible to improve the Hamming bound. So, the test functions 
can be useful in order to investigate the existence of perfect codes. 

One can define i?(n, d) = ■'^)) gf ^]^g j^gg^ code, where 

d) is the largest possible cardinality of the (il", ICI, CT(d))-code. For each 
real number 0 < 5 < 1 is defined 



R{6) = lim supn^aoR{n, d) 

to be the rate of a code, where d/n ^ S. It is known that R{0) = 1, and R{6) = 0 
for 1/2 < 5 < 1 . Obviously any bound for the cardinality of the codes (designs) 
gives a bound for R{6). The best known upper bound for the rate of code is 
found by McElliece, Rodemich, Rumsey and Welch (MRRW) [17]. Note that for 
fixed d and <5 Levenshtein bound is stronger then MRRW-bound however, the 
asymptotic forms of both bounds are the same. The best known lower bound 
is Gilbert- Varshamov (GV) bound [16]. Unfortunately the rate R of the new 
bounds are the same as the classical ones (Rao or Hamming). 



4 Johnson Space 

We consider the Johnson space M = J” (n = 2, 3, . . . ; w = 1, . . . , L’^/2J) as 
set M of all w-subsets of the n-set {!,... ,n} where the distance between two 
elements x,y £ is defined to be w — |a:p|?/|. The Johnson space can be also 
considered as a subset of the binary Hamming space consisting of all vectors 
which have exactly w non-zero coordinates, with distance being equal to half 
of the Hamming distance. That is why codes in Jlf are usually called constant 
weight codes. For n = 2w the Johnson space is antipodal and for n > 2w it is 
non-antipodal. 

It was shown by Delsarte [3] that the Johnson space is a P- and Q-polynomial 
graph. Here we consider only the properties of Q-polynomial Johnson space. Of 
course, the same calculation can be done in P-polynomial Johnson space. 
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With the standard substitution a{d) = 1 — 2^ we obtain Q-polynomial Johnson 
space. We have s(AJ) = w, D{M) = w, dk = k, measure constants Wk = 
(fc)(”fc“)/0’ = 0, 1, ... ,w, and constants Tfc =(”)- (^”J, k=l,...,w, 

{ro = !)• 

The system of ZSF {Qk{t)}^^Q in this case can be defined by 

k /k\ /n+1 — , 

Qk{t) = Qk{<j{d)) = Jj!’""{d), where = ^(-1)^ 

j=0 Vj/V j 

are the Hahn polynomials of degree k. 

Considering recurrence (4) one can find for this case 

2{w — k){n — w — k){n — /c + 1) 2{w — k + l)(n — w — k + 1) 

^ w{n — 2k){n — 2k + 1) ’ ^ ■u;(n — 2fc + l)(n — 2fc + 2) 

The corresponding adjacent systems of polynomials (2), (3) as calculated by Lev- 
enshtein in [14] are 




0,1 _ n 

^ 2w’ 



,, 0,1 _ 



n — 1 



n — 1 
k-1 



Qk’^i^id-)) = Jk for fc = 0, ...,w — 1 



jn — l,w — l 






n 1,0 _ f to(n — 2k)(w — n) 

2(n — w)’ \kjn{w — k){w + k — n)’ 






(w - k)(n -w-k) Jk’^{d) - 

(n — 2k) d 



1^1 _ n(n — 1) 1,1 _ I “ f j (n — ui)(u; — l)(n — 2fc — 1) 

4(w— l)(n — ui)’ ^ V ^ j (n — l){n — w — k)(w — k — 1)' 



Qi’\a{d)) = 



{w — k — \){n — w — k) Jk 



\d) - 



(n — 2fc — 1) 



Notice that the values ^(1) = 1 coincide with the corresponding 

limits of the right-hand sides when d tends to 0. 

The designs in the Johnson space are called block designs, commonly denoted 

by S\{T,n,w). Their cardinality satisfies jCj = a|^. 

Since \M.\ = (”), we will present well known pairs of universal bounds, i.e. 
inequalities which are valid for all codes C C J^, which follows from Theorem 

2 . 
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The first pair is for a code C C 

Vr / ^ 1^1 ^ \w+l — d/ 

/w\ — I — / w \ ’ 

Vr/ Vi(j +1 — f// 

where any of the bounds is attained if and only ifd + r = r(;+l. 

The following universal pair of bounds is formed by Ray-Chaudhuri and Wil- 
son [20] bounds (for t even) and Levenshtein [14] (for r odd) for a code C C J”. 



^Q(j:,r)<|q< 



□ 

Dp{Jj;^,d-iy 



where Dq{J^,t) = (”)^(”/) and Dp{Jl,r) = Ei=o (“/)(" are the 
Delsarte bound for t = 21 -\- 0-design in Q— polynomial Johnson space and in 
P— polynomial Johnson space respectively. 

The following bound was proved by Levenshtein [14]. 



(") 

Lp(J^:;,cr{T 



<iq<iQ(j:,a((i)). 



Now we are ready to investigate Delsarte bound in this case. First we give 
the exact intervals for w and fixed n, where the test functions are negative. 



Theorem 12. Let A4 = n > 2w (i.e. A4 is non-antipodal space). Then 
Cyj:),Qr+ 2 ) <0 for 



t + 2<w<^~ ^\I- 



16fc3 J- (8 — 20n)A;2 J- (8n^ — 12n — 8)k J- 3n^ — — 4 

4fc^ J- 4fc J- n J- 1 



T J- 2 < ru < 



n J- 3fc J- 2 
2k‘^ + k + n/2 



if T = 2k. 
if T = 2k 1. 



Let us define the following functions of w 



Sodd{w) = -k {-2k^ - 20k - 18)w^ -k {37k^ -k 6fc® -k 50fc -k 14)u; 

-k 12 - 4fc"‘ - 21fc® -2k- 29k^ 

Sever,{w) = 32w'^ -k (-16fc^ - 288fc - 208)w® -k (1440fc -k H2fc® -k 1160fc^ -k 368)w® 
-k (-4156fc^ - 56 - 320^"^ - 2632fe® - 1840fc)w"‘ 

-k (6388fc® -k 3622fc^ -k 480fe® - 206 -k 96fe -k 3568^"^)^® 

-k (136fc^ -k 606fc - 5508fc"‘ - 3466fc® - 2864fc® - 400fc® -k 43)w^ 

-k (-594fc^ -k 1256fc® - 338fc® -k 2524fc® -k 1586fc"‘ -k 12 - 98k -k 176fc'^)w 
-k 162fc"‘ - 270fc® - 24fc - 32fc® - 12 -k 182fc® -k 22fe^ - 480fc® - 282kf 
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Theorem 13. Let M = J^, n = 2w (i.e. antipodal space). Then 

Gt{JI,Qt+3 ) < 0 for 



T + 3 <n< [aevenj if T = 2k. 

T + 3 <n< [aodd] if T = 2k + l, 

where aodd and aeven are the greatest zeros of the polynomials Sodd{w) and 
Seven{w) respectively. 

Applying Theorem 5 and Theorem 7 we improve the Delsarte bound in Q— 
polynomial Johnson space. 

Theorem 14. For a code C Q Jff we have the following hound 

S{Jl,T-,T + e) < |C|, 

where e = 2 if n > 2w (non- antipodal Johnson space) and e = 3 if n = 2w 
(antipodal Johnson space). 

Finally for non-antipodal Johnson space we present the asymptotic behavior 
of the bound when r grows with n and w. 

T 

Theorem 15. Let Ai be non-antipodal Johnson space (n > 2w). Lf lim — = 

n—^oo Ji 

S T 

— ,and lim — = J 2 where 0 < < J 2 < 1; then 

2 n —^00 yj 

~ D{M,T)const{ 6 i,S 2 ). 

The constant is bigger than 1 and is equal to 
const{ 6 i,S 2 ) = when r is odd and 

const{Si, 62 ) = 2 Si^'^ ’ when r is even. 
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Abstract. The applications of digital chaotic maps in discrete-time 
chaotic cryptography and pseudo-random coding are widely studied re- 
cently. However, the statistical properties of digital chaotic maps are 
rather different from the continuous ones, which impedes the theoretical 
analyses of the digital chaotic ciphers and pseudo-random coding. This 
paper detailedly investigates the statistical properties of a class of digital 
piecewise linear chaotic map (PLCM), and rigorously proves some use- 
ful results. Based on the proved results, we further discuss some notable 
problems in chaotic cryptography and pseudo-random coding employing 
digital PLCM-s. Since the analytic methods proposed in this paper can 
essentially extended to a large number of PLCM-s, they will be valuable 
for the research on the performance of such maps in chaotic cryptography 
and pseudo-random coding. 



1 Introduction 

Chaotic systems have many interesting properties, such as the sensitive de- 
pendence on initial conditions and control parameters, ergodicity, mixing and 
exactness properties, etc. [1]. Most properties can be connected with some re- 
quirements in cryptography and pseudo-random coding [2,3,4]. From 1990s, 
more and more researchers devote their contributions to a new field - chaotic 
cryptography; many analog and digital chaotic encryption systems have been 
proposed [2, 5, 6, 7, 3, 8, 9] and analysed [10,11,12]. As a general method to de- 
sign chaotic stream ciphers, chaotic pseudo-random coding techniques are com- 
monly used to construct PRBG-s (Pseudo-Random Bits Generators) [5,6,9]. At 
the same time, chaotic pseudo-random coding techniques have also developed 
separately in other areas, such as electronics, communications [13, 14, 15] and 
computer physics [16]. 
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As we know, piecewise linear chaotic maps (PLCM) are the simplest kind of 
chaotic maps from the viewpoint of realization. What’s more, they have uniform 
invariant density and good correlation functions [17], which is very useful for 
cryptography and pseudo-random coding [18]. In fact, many researchers have 
used them to realize chaotic ciphers and PRBG-s [14,9,8,6,7]. 

It seems that chaotic systems are perfect as a new rich source of cryptography 
and pseudo-random coding. Unfortunately, when chaotic systems are realized in 
finite computing precision, their digital dynamical properties will be far different 
from the continuous ones. Some severe problems will arise, such as short cycle 
length, non-ideal distribution and correlation functions, etc. Assume the finite 
precision is L (bits) and fixed-point arithmetic is adopted, it is the following rea- 
sons to cause such degradation: 1) All values represented with finite precision are 
binary rational decimals formulated as a/2^(a = 0 ~ 2^ — 1). Since the Lebesgue 
measure of all the decimals is zero, they cannot represent the right dynamical 
behaviors of the chaotic systems defined on a real interval with positive measure; 
2) There are only 2^ digital values to represent the chaotic orbits, so the cycle 
length of the orbits will not be larger than 2^ , generally it will be much smaller 
than 2^; 3) The quantization errors, which are introduced into the iterations of 
chaotic systems, will make the chaotic orbits depart from the theoretical ones 
with uncontrolled manners (it is impossible to know the exact errors). 

Some researchers have noticed the degradation of digital chaotic systems 
[10,11,9,19,20,13], and several remedies have been suggested: using higher finite 
precision [11,19], the perturbation-based algorithm [9,20], and cascading multiple 
chaotic systems [13]. Because it is difficult to measure the statistical properties of 
digital chaotic maps theoretically, experiments are generally used as the analytic 
tools to estimate the performance of the above remedies. However, sometimes 
experiments are not enough to tell us the right things about digital chaotic 
systems. The theoretical tools for digital chaotic systems are needed. 



2 Outline of Our Works 

In this paper, we strictly prove some interesting statistical properties about a 
class of digital PLCM with finite computing precision. Based on our proved 
results, we can explain some statistical degradation of digital PLCM-s theoreti- 
cally. Such degradation will cause the chaotic ciphers insecure, and cause chaotic 
pseudo-random sequences unbalanced. Furthermore, we discuss the performance 
of the three proposed remedies, and point out none of them can essentially im- 
prove such degradation. But the perturbation-based algorithm is still useful in 
practice, since it can be carefully used to enhance the performance of digital 
chaotic ciphers and pseudo-random coding. 

For other digital chaotic maps, we have not yet obtained exact corresponding 
results. But our proof techniques may probably be extended to many other digital 
chaotic maps conceptually. If one chaotic map contains a control parameter that 
is proportional to uniformly distributed final output, the digital chaotic map 
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may be weak from the viewpoint of this control parameter. In the future, we will 
try to find more delicate results. 

This paper is organised as follows. In Sect. 3, we firstly introduce some pre- 
liminary knowledge. In the following Sect. 4, we focus on the mathematically 
rigorous proofs of the interesting properties of digital PLCM-s. Since the whole 
proof is rather lengthy, it is divided into several parts. Based on the proved prop- 
erties, we explain what they mean in chaotic cryptography and pseudo-random 
coding in Sect. 5. A brief conclusion is given in the last section. 

3 Preliminary Knowledge 

3.1 Piecewise Linear Chaotic Map (PLCM) 

Generally, given a real interval X = [a, /?] C M, a piecewise linear chaotic map A : 
A — >■ A is a multi-segmental map: i = 1 ~ m, F{x)\Ci = Fi{x) = aiX + bi, where 
{C'i}™ 1 is a partition of A, which satisfies IJfci Ci = X and CiC\Cj = 0,'ii ^ j. 
Each element of the partition is mapped to A by Fi'. Vf = 1 ~ m, : Q — >■ A. 
Such a map has the following statistical properties on its definition interval A: 
1) it is chaotic, its Lyapunov exponent A satisfies 0 < A < Inm; 2) it is exact, 
mixing and ergodic, and has uniform invariant density function f{x) = 1/ (/3— a); 
3) the correlation r(n) = ^ lim ~ x){xi+n — x) will go to zero 

as n — >■ oo, where x, a are the mean value and the variance of x respectively; 
especially, if some conditions are satisfied, r(n) = 5{n) [17,1]. 

As we know [1], the uniform invariant density function means that uniform 
input will generate uniform output, and that the chaotic orbit from almost every 
initial condition will lead to the same uniform distribution f{x) = l/(/3 — a). 
But such a fact is not true for a digital chaotic map, this paper will point out 
that uniform digital input cannot generate uniform digital output for all control 
parameters. Such a fact will subsequently cause serious dynamical degradation 
when the maps are iterated again and again. Because it is inconvenient to analyse 
chaotic maps with uncertain formulas, in this paper, we focus our attention on 
the following specific PLCM used in [8]: 

{ x/p, X G [0,p) 

{x-p)/{l/2-p), xG[p,1/2], (1) 

F{l-x,p), xG[1/2, 1] 

where p is the control parameter, which satisfies 0 < p < 1/2. 

In order to facilitate the descriptions and proofs of the statistical properties 
in Sect. 4, we give some definitions in Sect. 3.2 and related results in Sect. 3.3. 

3.2 Preliminary Definitions 

Definition 1. A discrete set Sn = {a |a = G {0, 1}} is called 

a digital set with resolution n; Vz < j , St is called the digital subset with 
resolution i of Sj. Specially, define Sq = {0}, Soo = [0, 1), then we have {0} = 
5*0 C Fi C . . . C Fi C . . . C Foo = [0, 1). 
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Definition 2. Define Vi = Si — Si-i{i > 1) and Vq = Sq. Vi{0 < i < n) is ealled 
the digital layer with resolution i; Mp G V, i is called the resolution of p. The 
partition of Sn, {Vi}f^Q, is called the complete multi-resolution decomposi- 
tion of Sn', {Vi\^Q is called the complete multi-resolution decomposition 
of Soo = [ 0 , 1 )- For Sn, its resolution n is also called decomposition level, 

ur=0 and Vi ^ j, Vi n Vj = 0 . 

Definition 3. Vn > m, Dn,m = Sn — Sm is called the digital difference set 
of the two digital sets with parameters n and m. = {Si — is 

called the complete multi-resolution decomposition of Dn^m, n — m+1 is 
called the decomposition level. 



Definition 4. A function G : M — >■ Z is called an approximate transforma- 
tion function (ATF), if\/x G M, |G(a;) — x| < 1. Three basic ATF-s are: 1) 
[ccj - the maximal integer not greater than x; 2) [a;] - the minimal integer not 
less than x; 3) round(a;) - the rounded integer of x. Vcc G K., define its decimal 
part x — [xj as function dec(a;). The above three ATF-s have the following 
useful properties (please note not all ATF-s): 

ATF Property 1 : Vm G Z, G(x + m) = G(x) + m; (2) 

ATF Property 2 : a < x < b ^ [xj < G(x) < [x] . (3) 

The proofs of the two properties are rather simple, we omit them here. 

Definition 5. A function Gn '■ Soo Sn is called a digital approximate 
transformation function (DATF) with resolution n, if\/x G Soo = [0jl)> 
|G„(x)— x| < 1/2". The following three DATF-s are concerned in this paper (they 
are also the most frequently adopted DATF-s in digital computing algorithms) : 
1) floor„(x) = [x • 2"J/2"; 2) ceil„(x) = [x • 2"]/2"; 3) round„(x) = round(x • 
2")/2".^ The above three DATF-s have the following useful properties (please 
note not all DATF-s): 

DATF Property 1 : Vm G Z, Gn{x + = G„(x) + mjTP", (4) 

DATF Property 2 : a < x < b ^ floor„(a) < Gn{x) < ceil„(5). (5) 

The two properties are easily derived from the ATF Property 1-2. 

3.3 Preliminary Lemmas abont the Three Basic ATF-s 

For the three basic ATF-s - [-J, [•] and round(-), we have two fundamental lem- 
mas and one corollary, which will be useful in the proofs of the theorems in the 
next section. 

^ Consider 1 () Soo, without loss of generality, define ceiln(x) = 0 if [x • 2"] = 2", and 
define ronnd„(x) = 0 if ronnd(x -2") = 2". Such redefinitions will not essentially 
influence the following results since dec(l) = 0. 




Statistical Properties of Digital Piecewise Linear Chaotic Maps 209 



Lemma 1. Vn € Z+,a > 0, the following three facts are true: 

1. n- [a\ < [n • aj < n • [aj + (n — 1), and n- [a\ = [n • aj when and only when 
dec(a) G 

2. n ■ \a~\ — (n — 1) < \n ■ a~\ < n ■ [a] , and n- [a] — (n — 1) = |"n • o] when and 
only when dec(a) G ^1—^,1^ U{0}; 

3. n-round(a) — [n/2j < round(n-a) < n-round(a)+ [n/2j, and n-round(a) — 



[n/2j = round(n • a) when and only when dec(a) G 






U 






The proof of this lemma is given in Appendix A. 

Corollary 1. Vn G Z+, o > 0, we have the following results: 

1. [n • aJ = 0 (mod n) when and only when dec(a) G 0, 

2. [n • a] = 0 (mod n) when and only when dec(a) G 

3. round(n - a) = 0 (mod n) when and only when dec(o) 



0 , 



1 

2n 



u 



1 - 



1 

2n 



Proof. This corollary can be derived directly from the above lemma. 



Lemma 2. \/j,N,N' G Z+, and N,N' are odd integers satisfying 2^\{N + N'), 
we have [N/2^\ + [N' /2^ \ = {N + N')/2^ - 1. 

The proof of this lemma is given in Appendix B. 



4 Statistical Properties of Digital PLCM 

Give a one-dimensional chaotic map F{x,p) : I ^ I, where I = Soo = [0,1). 
When the finite precision is n, its digital version can be expressed by Fn{x,p) = 
Gn o F{x,p) : Sn — >■ Sn, where G„(-) is a DATF, floor„(-), ceil„(-) or round„(-). 
Denote the corresponding ATF of G„(-) as Go(-)- 

Assume Pj denotes the probability of the lowest j bits of Fn{x,p) are all zeros, 
i.e., the probability of Fn{x,p) belongs to Sn-j'- Pj = P{Fn{x,p) G Sn-j}- For 
the map denoted by (1)^, Wp £ Vi C Si C S'„(2 < i < n), we can deduce some 
interesting results about Pj(X < j < n), which are rather different from the 
expected ones based on the perfect continuous statistical properties of the map. 
Moreover, the results can be essentially extended to all digital PLCM-s described 
in Sect. 3.1. 

Because the whole proof is rather lengthy, we divide it into several parts: 
firstly a fundamental lemma, then the results about Pj (i < j < n) and the ones 
about Pj(X S j < i), finally two comprehensive theorems. 

^ Because 1 ^ Soo, redefine F„(l/2,p) = 0. Consider F^{l/2,p) — 0 and dec(l) = 0, 
such redefinition will not essentially influence the following results. 
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4.1 A Fundamental Lemma 

Firstly, we introduce Lemma 3, which gives some useful results about the highest 
n — i bits and the lowest i bits of Fn{x,p). This lemma is the fundamental of 
the following proofs. At the same time, this lemma reflects some facts about the 
local linearity of the PLCM-s, which makes the obtained results in this paper 
conceptually available for other PLCM-s. 

Lemma 3. Vp G Di Q = Si — {0}(1 < i < n),x £ Sn- Assume p = Npj2^,x = 
Aa;/2", where Np, are integers satisfying 1 < Np < 2* — 1 and 0 < < 2"— 1. 

we have the following three results: 



1. Gn{x/p) G Sn-i ^ N^ = 0 (mod Np), 

2. floor„_,(G„(a:/p)) = 



3. Gn{x/p) mod 




Go (2* • (A, mod Np)/Np) 
2 " 



(6) 

( 7 ) 

(8) 



Proof Because x/p = ^p)/^p ^ 

have Gn{x/p) = • [N,,/Np\ + ^ ■ (A,, mod Np)/Np) Property 

1, we can rewrite G„(x/p) as follows 

Let us discuss the above equation under the following two conditions: 

a) When A, mod Np = 0: G„(:r/p) = + 0 G Sn-P, 

b) When A^, mod Np = fc 0: Obviously 1 < fc < Np — 1. Considering p < 1, 
we have 2^/Np > 1, then 1 < 2* • (Aj, mod Np)/Np < 2* — 1. Thus, from ATF 
Property 2, 1 < Go (2* • (A^, mod Np)/Np) < 2* — 1. Therefore, 

™ ™ ^ ^ 5 ,..., ( 10 ) 

From a) and b), we can deduce Gn{x/p) G Sn-i ^ N^ = 0 (mod Np). 

At the same time, when A^, mod Np = 0, floor„_i(G„(x/p)) = 

when N, mod A^ = A: 0, floor„_,(G„(a:/p)) > U^^/^pJ_+ V2^ ^ 

and floor„_i(G„(x/p)) < +jf ~ ^ so Anally we 



can get floor„_*(G„(x/p)) = 



[NJNp\ 



From the above result and (9), the following result is true: 
G.A.M mod -L = G.(2‘ ^ («, mod 



The proof is complete. 
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4.2 Results about Pj{i ^ j ^ n) 

Theorem 1. Assume random variable x distributes uniformly in Sn, for the 
digital PLCM (1), Vp G <i< n), we have: Pi = 4/2*. 

Proof. Assume p = Np/2^,x = where Np,Nx are integers that satisfy 

1 < Np < 2* — 1 and 0 < < 2” — 1. Because x distributes uniformly in Sn, 

Nx will distribute uniformly in integer set [0,2” — 1]. Since the chaotic map is 
defined piecewisely, we consider it on different segments: 

a) a; G [0,p) ^ € [0,2"“* • Np — 1]: Fn{x,p) = G„(x/p), from Lemma 3, 

we know Fn{x,p) G Sn-i when and only when Nx = 0 (mod Np). Because Nx 
distributes uniformly in [0,2” — 1], the probability of Fn{x,p) G Sn-i will be 
2”-7(2”-* • Np) = 1/Np. That is to say, P,\x G [0,p) = l/Np. 

b) X G [p, 1/2): Assume x' = x — p,p' = 1/2 — p, we have F„(x,p) = x'/p', 
where x' G [0,p'). Similarly to a), define p' = AP/2*,x' = Nxj2^, we will get 
P\x G [p, 1/2) = P\x' G [0,p0 = 1/fV;. 

c) X G [1/2,1): Consider the map is even symmetric to a; = 1/2, we can 
easily get the following two results: Pi\x G (1/2,1 — p] = 1/A/ and Pi\x G 
((1 — p, 1) U {1/2}) = 1/Ap. Here please note that 1 ^ S'„ and 1/2 takes its 
position that is symmetrical to 0, which will not make any difference to Pi. 

From a) - c) and the total probability rule, we can deduce: 



P = P{x G [0,p)) -P,\xG [0,p) + P{x G [p, 1/2)) • P,|a: G [p, 1/2) 

+ P(a:G (1/2,1 -p])-P,|a:G (1/2,1 -p] 

+ P{x G ((1 - p, 1) U {1/2})) -P,\xG ((1 - p, 1) U {1/2}) 

1 ,1 ,1 1 11114 



’p 



The proof is complete. 



Theorem 2. Assume random variable x distributes uniformly in Sn, for the dig- 
ital PLCM (1), Vp G Di^i(2 < i < n), &oorn-i{Fn{x, p))^ distributes uniformly 
in Sn—i . 

Proof. Similarly to the proof of Theorem 1, assume p = Ap/2*,a; = Nxf2'^, we 
separately consider the map on different segments: 

a) X G [0,p) ^ Nx G [0, 2”“* • Np — 1]: F„(x,p) = Gn{x/p), from Lemma 
3, we have floor„_i(^ii(a;,p)) = L-^a;/ApJ/2”“*. Because x distributes uniformly 
in Sn, Nx distributes uniformly in [0,2”“* • Np — 1]. Thus \Nx/Np\ distributes 
uniformly in [0,2”“* — 1], i.e., floor„_j(F„(x,p)) distributes uniformly in Sn-i 
when x G [0,p). 

b) X G [p, 1/2): Assume x' = x — p,p' = 1/2 — p, we have F„(x,p) = x'/p' , 
where x' G [0,p'). Similarly to a), we can prove floor„_j(F„(x,p)) distributes 
uniformly in Sn-i when x G [p, 1/2). 

The highest n — i bits of Fn(x,p). 



3 
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c) X £ [1/2,1): Because the map is even symmetrical to a; = 1/2, it can 
be easily deduced that floor„_i(F„(a;,p)) distributes uniformly in S„-i when 
XG [1/2,1). 

From a) - c), we know it is true that floor„_i(F'„(x,p)) distributes uniformly 
in Sn-i- The proof is complete. 

Theorem 3. Assume random variable x distributes uniformly in Sn, for the 
digital PLCM (1), Vp G Hi,i(2 < i < n) and i < j <n, Pj = 4/2^ holds. 

Proof. Let us discuss the different conditions when j = i and j > i. 

a) j = i: From Theorem 1, Pj = 4/2* = 4/2^; 

b) i<j<n: Assume bm(jn=l ~ n) represents the bit (from the lowest 

bit to the highest one) of Fn{x,p), Pj = p|f„(x,p) G Sn-i/\bj ■ ■ ■ 6^+1 = 

Recall the proof of Theorem 3, when Fn{x,p) G Sn-i (be., mod Np = 
0), [Nx/Np\ (the highest n — i bits of Fn{x,p)) still distributes uniformly in 
[0, 2”-* - 1]. So we can get Pj = P{P„(x,p) G S'*} • ^ ^ ^ 

From a) and b), we have: i < j < n Pj = 4/2L The proof is complete. 




4.3 Results about Pj{l ^ j < i) 

Firstly, we introduce Lemma 4 and Corollary 2, which will be used to facilitate 
the proof of Theorem 4. 

Lemma 4. Assume n is an odd integer, random integer variable K distributes 
uniformly in Z„ = [0,n — 1], the following fact is true: K' = f{K) = (2* • 
K) mod n distributes uniformly in Z„, i.e., 'ik G [0,n — 1],P{AT' = k} = 1/n. 

Proof. As we know, (Z„,+) is a finite cyclic group of degree n, and a is its 
generator when and only when gcd(a,n) = 1, where “+” is defined as “(a + 
b) mod n” (see Theorem 2 on page 60 of [21]). Therefore, a = 2* mod n is one 
generator of Z„ since gcd(a, n) = gcd(2*, n) = 1. Consider K' = (2* - AT) mod n = 
(a • K) mod n, we can see / : Z„ — >■ Z„ is a bijection. Then we will immediately 
deduce: K' = f{K) distributes uniformly in Z„ because K distributes uniformly 
in Z„. That is to say, Vfc G [0, n — 1],P{K' = k} = 1/n. The proof is complete. 

Corollary 2. Assume n is an odd integer, random integer variable K distributes 
uniformly in Z„ = [0,n — 1]. Then dec(2* • K/n) distributes uniformly in S = 
{x\x = k/n,k G Z„|. 

Proof. This corollary is the straightforward result of the above lemma. 

Theorem 4. Assume random variable x distributes uniformly in Sn, for the 
digital PLCM (1), Vp G 14(2 <i< n)^ and 1 < / < * — 1, we have: 

Please note the condition p £ Vi, NOT p £ in Theorem 1-3. 



4 
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Pj = 



1/2^+ 2/2* 
l/2^1<j<z-2l 
4/2% j = i-l /’ 



G„(-) = floor„(-) or ceil„(-) 
G„(-) = round„(-) 



Proof. p= Np/2^,x = Nxf2^, where Np,N^ are integers that satisfy I < Np < 
2* — 1 and 0 < < 2” — 1. Because x distributes uniformly in S'„, will 

distribute uniformly in integer set [0, 2" — 1]. Let us consider the digital map on 
different segments: 

a.) X G [0,p) Nx G [0,2"“*-A% — 1]: Fn{x,p) = Gn{x/p), from Lemma 3, we 
know the lowest i bits of Fn{x,p) are determined by Gq(2* • {N^ mod Np)/Np). 
Then we can deduce Fn{x,p) G Sn-j Gq(2* • {N^ mod Np)/Np) = 0 mod 2F 
Define N = mod which distributes uniformly in [0, Np — 1] because 
of the uniform distribution of f\%. Define a = (2*“-l • N)/Np, we can re-write 
Go(2* • {Nx mod Np)/Np) as Gq{ 2^ ■ a). From Corollary 1, we can get: 



[o, i) , Go(-) = L-J 

Go(2-* • a) = 0 (mod 2% dec(a) € < . Go(-) = [-1 

_ [o, [l- =round(-) 



( 11 ) 



From Corollary 2 (please note p G Vi ensures Np is an odd integer) , we know 



dec(a) = k/Np{k = 0 ~ — l)with uniform probability. 

Based on (11) and (12), we can deduce: 

Nr, 



( 12 ) 



0 llT 
’ 2^ 



k G { 



N„ 



Np--f,Np][j{0} 

0 , ^ 



23 + 1 



u 



, Go(-) = L-J 
, Go(-) = L-1 
iVp-|fe,iVp),Go(-) = round(-) 



(13) 



Consider k is an integer, we can get the probability 

_ [lVp/2^'J + 1 

P{Go(2^-a) = o(mod2^)} = 



N„ 



2 ■ [Np/2^+^\ + 1 






Go(-) = L-J or L-J 
Go(-) = round(-) 



(14) 

b) a; G [p, 1/2): Assume x' = x — p,p' = 1/2 — p, we have F„(x,p) = x' jp' , 
where x' G [0,p'). Similarly to a), define p' = Np/2^,x' = fV'/2”, we will get 



P{Go(2l - o') =0 (mod 2^)} = 



[n;/2^\ + 1 

N' 

2 ■ [N;/2^+^\ + 1 

N' ’ 

ly p 



Go(-) = L-J or L-J 

Go(-) = round(-) 



where a' = (2*"^ • N') /N' N' = N'^ mod N' 



( 15 ) 
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From (14) and (15), we can get the conditional probability Pj\x G [0, 1/2). 
Consider the map is even symmetrical to x = 1/2, the final probability will be 
Pj = 2 ■ {Pj\x G [0, 1/2)). In the following, we separately consider the condition 
of G„(-) = floor„(-) or ceil„(-) and G„(-) = round„(-): 

i) G„(-) = floor„(-) or ceil„(-), i.e., Gq(-) = [’J or p + p' = 1/2 ^ Np + 
N'p = 2*“^ 2^\{Np + N'p), from Lemma 2, we can deduce: 



, [Np/V\ + l , VK/2^\+1\ 

^ Np N'p J 

ljV,/2^'J + LfV//2^J+2 \ ^ 2^-.-i_i + 2 = i , ^ 
2* J 2*“^ 2-^ 2* ■ 



(16) 



ii) G„(-) = round„(-), i.e., Go(-) = round(-): When j < i — 1, Np + Np = 
2*“^ 2l+^|(iVp + Np), from Lemma 2, we can get: 



p —o I J, 2 • [iVp/2^+^J + 1 I / “ L^-p 

Pj-2 ( p jp + p 



2 • [N'/2^+^\ + 1 



N' 

±'> p 



^2 /^ 2 ([A^p/2^'+^J + LlV;/2^'+^J) + 2 ^^ _ 2 { 2 '-^~^ - l) + 2 _ ^ 



(17) 



-)i-l 



2^ 



When j = i — 1, Np + N'p = ^ =i> 2^+^ \ {Np + N'p){j + 1 = i > f — 1), Lemma 

2 cannot be used, but we can calculate the probability Pj by directly observing 
(14) and (15): Np < 2\N'p < 2\ so Np/2^+^ < 1 ^ L^p/2^'+^J = 0,fV;/2J+i < 
1 [A''//2l+^J = 0, then we have 



Pj=2[p 



2 - 0+1 

N„ 



-P 



2-0 + 1 
NL 



= 2-1 = 1. 
2 * 2 * 



(18) 



*p - -p 

From (16) - (18), we can directly get the final result. The proof is complete. 



4.4 Comprehensive Results about Pj{l ^ j ^ n) 

In the above subsections, we have separately proved the results about Pj{i < 
j < n) and Pj{l < j < i) for any p & V) C Si C S'„(2 < i < n). To make 
the above “rough-and-tumble” results tidier, we rearrange them into two new 
theorems, which are easier to be understood and to be used in practice. 

Theorem 5. Assume random variable x distributes uniformly in Sn, Vp G 
Vi{2 <i< n), the following results are true for the digital PLCM (1): 

( 4/2^, i < j <n 

1. When G„(-) = round„(-), Pj = < 4/2*, j = i-l ; 

[l/2G l<j<i-2 

2. When G„(-) = floor„(-) orceil„(-), Pj = | ^/2^{^2/2*i 1 < j <1- 1 ’ 

3. Vfc G [0,2"-* - l],P{floor„_,(F„(x,p)) = fc/2"-*} = 1/2""*. 
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Proof, the first two parts are the combinations of Theorem 3 and 4, the last part 
is just equivalent to Theorem 3. 

Remark 1. If x distributes uniformly in the digital set 5'„, F„{x,p) does not 
distribute uniformly in Sn (but its highest n-i bits does in Sn-i, Vp S Si), since 
Pj = 1/2-^ if Fn{x,p) distributes uniformly in S'„. To understand what Theorem 
5 really means, see Fig. 1 for more visual details. 





Fig. 1. Pj{l < j < n) when p = 3/16 G 14 C Si, where the finite precision n = 10 
(The line marked with diamond signs denotes the probability under digital uniform 
distribution 1/2^, and the other line denotes the probability Pj) 



Remark 2. Note there is an absolutely weak control parameter p = 1/4 G 14 C 
Si, which satisfies Pi = 4/2^ = 1. That is to say, the lowest 2 bits of Fn{x,p) will 
always be zeros. In addition, Vxq G 14(2 < i < n), after at most \i/2\ iterations, 
the chaotic orbit will converge at zero: Vm > \i/2\,F'^{xif) = 0. 

Theorem 6. Assume random variable x distributes uniformly in Sn, and Pj = 
P{Fn{x,p) G Sn-i}. The following results are true for the digital PLCM (1): 

1. Vp G A.i = 5, - 5i = UL2 = 4/24 
Vp G 14+1, Pi = 4/2*+i; 

3. VpG V,(j>i + 2), P, = 

Proof. This theorem is an equivalent form of Theorem 5. 

Remark 3. Theorem 6 tells us: for the control parameters p with different res- 
olution (i.e., in different digital layers of Dn,i), rather large difference exists in 
the generated chaotic orbits. Hence, from the observation of Pi ~ P„, one can 
get the resolution of the control parameter p. In Fig. 2, we give the experimental 
result of P 5 with respect to p when n = 10 and Gn(‘) = floor„(-), which entirely 
coincides with Theorem 6. 



f 1/2* , Gn{-) = round„(-) 

\ 1/2* -I- 2/2-5, = floor„(-) or ceil„(-) 
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Fig. 2. P 5 = P{Fn{x,p) € S'n-s} with respect to p, where n = 10, Gn(-) = floor„(-) 
(The dashed line denotes 2“®, the ideal probability under digital uniform distribution) 



4.5 Extension to Other Digital PLCM-s 



Although the above results are based on the specific PLCM denoted by (1), they 
can be essentially extended to all PLCM-s described in Sect. 3.1, of course the 
exact results will be different for different maps. From the proofs of theorems in 
above sub-sections, we can see that the statistical degradation occurs because 
of the piecewise linearity (Lemma 3 and 4) and the essential properties of the 
three ATF-s (Lemma 1 and 2). Employing Lemma 1-4 and Corollary 1-2 on 
other PLCM-s^, we can easily obtain results corresponding to Theorem 5 and 6. 
For example, we can get the results about the following chaotic map: 



F{x,p) 



xjv , a; e [0,p) 
(1 - a:)/(l -p), X G [p, 1] 



(19) 



where p satisfies 0 < p < 1. This map is one of the simplest PLCM-s, and 
generally called tent map. 

Theorem 5'. Assume random variable x distributes uniformly in Sn, Vp G 
Vi{l < i < n), the following results are true for digital tent map: 

r 2/2-J' , i<j<n 

1. When Gn{-) = round„(-), Pj = < 2/2* , j = i — 1 ; 

l<j<i-2 

2. When G„(-) = fioor„(-) orceil„(-), Pj = | -p 1/2*’ 1 < j < f - 1 ^ 

3. Vfc G [0,2**-* - l],P{fioor„_,(F„(x,p)) = fc/2**-*} = 1/2**-*“ 



Experiments show the results absolutely right. Of course there is the correspond- 
ing Theorem 6*, we omit it here since it is just another form of Theorem 5*. 

® Any PLCM defined on interval [a, /?] can be re-scaled to its topologically conjugated 
PLCM defined on [0, 1] with a linear function h{x) = {x — 01 ) /{j3 — a). 
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5 The Roles of Digital PLCM-s in Cryptography and 
Pseudo-Random Coding 

From remark 1, we can know that a uniformly distributed digital signal will 
lead to non-uniform distribution after iterations of a digital PLCM. Such non- 
uniformity will become more and more severe as the iterations go, see Fig. 3 for 
some intuitional view (compare it with Fig. 2, the probability at most control 
parameters increases, and the probability at p = 1/16 even reaches to 1). We 
can use the probability Pi to denote the degree of such non-uniformity: for a 
fixed control parameter, the larger Pi is, the larger the degradation will be. In 
remark 2, p = 1/4 G V 2 corresponds to the most serious degradation, so it is 
the weakest control parameter. The less weak control parameters are ones in V 3 ; 
then those in V 4 , V 5 , • • •. 




P 

Fig. 3. P 5 = P{Fn^{x,p) G S„-s} with respect to p, where n = 10, G„{-) = floor„(-) 
(P 5 is the probability after 32 chaotic iterations of the digital PLCM (1), the dashed 
line denotes 2 “®, the ideal probability under digital uniform distribution) 



5.1 Performance of the Three Remedies to Digital PLCM-s 

In Sect. 1, we have mentioned three remedies proposed by other researchers. In 
this subsection, we discuss whether they will work well to improve the degrada- 
tion of digital PLCM-s. 

Apparently, cascading multiple digital chaotic maps cannot essentially im- 
prove the weaknesses, since multiple cascading PLCM-s are just equivalent to a 
new PLCM with more segments. 

Using higher precision cannot change the weaknesses of any fixed control 
parameter either. For example, for the map (1), p = 1/4 will always be absolutely 
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weak for any finite precision, and Vp G Vi will always be same weak for any finite 
precision n > i. But higher precision will introduce more stronger digital layers® 
and then improve the overall weakness, which makes the condition better. 

Now assume the perturbation-based algorithm is used to improve the degra- 
dation of digital PLCM-s. We find there exists a “strange” paradox: assume the 
chaotic orbit {x{m)}'^^i is improved to obey nearly uniform by perturbation, 
according to Theorem 5 and 6, the chaotic sub-orbit {x{m )}'^^2 iiot obey 
uniform distribution because {a;(m )}“^2 = thus 

will not either. What does such a fact mean? It implies the non-uniformity re- 
vealed by the above theorems is the lower bound of the degradation of digital 
chaotic orbits. In other words, the perturbation-based algorithm cannot essen- 
tially improve the degradation to a better condition than the one depicted 
in Theorem 5 and 6. However, as we will point out in the next subsection, the 
perturbation-based algorithm is still useful to enhance the digital chaotic ciphers 
and pseudo-random coding with careful considerations. 



5.2 Notes on Chaotic Ciphers and Pseudo- Random Coding 

If the digital PLCM-s are directly used in chaotic ciphers and the control param- 
eter are used as the secret key (as most chaotic ciphers do), the cryptographic 
properties of the ciphers will not be perfect, and many weak keys will arise 
(see Fig. 3), because of the severe degradation induced by the digital chaotic 
iterations. 




Fig. 4. Digital chaotic cipher with secretly exerted perturbation (The perturbation 
should be secretly exerted at position B not A) 



To escape from such a bad condition and enhance the security, we suggest 
using the perturbation-based algorithm as follows: the perturbation is secretly 
exerted and the chaotic orbit is output after perturbation (See Fig. 4). It is 
based on the following fact: if {x(rn)}'^^i can be observed by one intruder, he 
will probably judge the resolution i of the right key through the probabilities 
Pj(j = 1 ~ n) (see Theorem 6 and Remark 3), and then search the key only in 

® When finite precision increases from nton', n' — n stronger digital layers ~ 

Vn' will be added, although n old digital layers Vi ~ V„ remain. 
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the digital layer Vi that is smaller than the whole key space (the smaller i is, the 
faster the search will be and the weaker the key). If the perturbation is exerted 
secretly at point B, one intruder can only observe perturbed {x{m)}'^^i not 
itself, then it is relatively more difficult for him to get information 
about Ki without knowing K 2 - But it is obvious that K\ will still be weak if 
K 2 is broken, and vice versa. It means the final key entropy will be smaller than 
the sum of the two sub ones: H{K) = K 2 )) < H{Ki) + H{K 2 ). 

If the digital PLCM-s are used to generate pseudo-random bits, the generated 
binary sequences may be unbalanced since the chaotic orbits are not uniform. 
For example, if the map denoted by (1) with p = 1/4 is selected and the lowest 
2 bits of chaotic orbit are used to generate pseudo-random bits, we can see they 
will be 000 • • •. Fortunately, from Theorem 3, we can use the highest n-i bits to 
construct desired pseudo-random bits. Here please note (approximately) uniform 
distribution of chaotic input is required. The perturbation-based algorithm will 
be useful for such a task. 



6 Conclusion 

We have rigorously proved some statistical properties of digital piecewise linear 
chaotic maps (PLCM) and explained their roles in chaotic cryptography and 
pseudo-random coding. Our works will be useful for the design and performance 
analyses of chaotic ciphers with theoretical security and PRBG-s with really 
good statistical properties. 

For other chaotic maps, our results cannot straightforward be extended. But 
the proofs made in this paper depend on some essentially properties of ATF-s 
(Lemma 1 and 2) and the following fact: on every monotonic segment of digital 
chaotic maps, one control parameter is proportional to the uniformly distributed 
final output (Lemma 3 and 4) . Consider the uniform final output is always desired 
for cryptography and pseudo-random coding, the proofs may be available for 
other digital chaotic maps that can be used in the two areas. In the future, we 
will try to find results concerning more generic digital chaotic maps. 

Acknowledgement. The authors wish to thank Dr. Di Shuang-liang at Xi’an 
Jiaotong University for his valuable suggestions, and Miss Han Lu at Xi’an 
Foreign Language University for her help in the preparation of the final paper. 



Appendix A: The Proof of Lemma 1 



Proof. We prove the three sub-lemmas separately: 

1. Because a = [aj -I- dec(a), n ■ a = n ■ [a\ + n ■ dec(a). Considering 0 < 
dec(a) < 1, 0 < n-dec(a) < n 0 < [n-dec(a)J < n — 1. From the definition of 
[•J, we can get [n • aJ = [n • ([aJ -I- dec(a))J = n • [aJ -I- [n • dec(a)J n • [aJ < 
[n- a\ < n • [aJ -I- (n — 1), where n • [aJ = [n • aJ [n • dec(a)J = 0, that is to 



say, 0 < n • dec(a) < 1 dec(a) G 
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2. i) Whendec(a) = 0: \n-a] = n-a = n-\a\-, ii) Whendec(a) G (0, 1): Assume 

dec'(a) = 1— dec(a) G (0, 1), then a = [a] — dec'(a), then n-a = n- [a] — n-dec'(a). 
Considering 0 < n • dec'(a) < n, n - \a~\ — n < n - a = n - \a~\ — n - dec'(a) < n • [a] . 
From the definition of [•] , we can get n • [a] — (n — 1) < |"n • a] < n • [a] , where 
n ■ \a'] = \n ■ a'] n ■ dec'(a) G (0, 1), then dec(a) G (1 — 1). As a whole, we 

have n • [a] — (n — 1) < |"n • a] < n ■ \a~\, and n • [a] = |"n • a] when and only 

when dec(a) G Ul^}- 

3. From the definition of round(-), we have round(a) — 1/2 < a < round(a) + 
1/2. Thus n • round(a) — n/2 < n ■ a < n ■ round(a) + n/2. i) When n is an even 
integer, it is obvious that n-round(a) — n/2 < round(n-a) < n-round(a) + n/2. ii) 
When n is an odd integer, n-round(a) — n/2+1/2 < round(n-a) < n-round(a) + 
n/2 — 1/2, that is to say, n- round(a) — (n — l)/2 < round(n- a) < n- round(a) + 
(n — l)/2. As a whole, we can deduce: n • round(a) — [n/2j < round(n • a) < 
n ■ round(a) + [n/2j , where n • round(a) = round(n ■ a) n ■ round(a) — 1/2 < 



n - a < n - round(a) + 1/2, that is to say, dec(a) G 
The proof is complete. 
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Appendix B: The Proof of Lemma 2 

Proof. Because a = [aj + dec(a), [iV/2lJ + [W/21J = (^N/2^ — dec{N/2^)) + 
(W/2-1 — dec(W/2l)). Assume N = n\ ■ 2^ + U 2 ,N' = n[ ■ 2^ + n^ and N + 
N' = 2^(fc > j), we have dec(A^/2l) = (iV mod n)/2^ = n2/2l, dec(fV'/2l) = 
{N' mod n) /2l = n'2l2f Since N, N' are odd integers, we can get U 2 > 0, n^ > 0. 
From 2^{N+N'), it is obvious that n 2 + n 2 = 2-1 dec(A^/2l)+dec(iV'/2l) = 1, 
thus [A^/2lJ + [W/ 2 IJ = {N + N')/2^ — 1. The proof is complete. 
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Abstract. We explain the theoretical background of the wide trail de- 
sign strategy, which was used to design Rijndael, the Advanced Encryp- 
tion Standard (AES). In order to facilitate the discussion, we introduce 
our own notation to describe differential and linear cryptanalysis. We 
present a block cipher structure and prove bounds on the resistance 
against differential and linear cryptanalysis. 



1 Introduction 

The development of differential [2] and linear cryptanalysis [7] has led to several 
design theories for block ciphers. The most important requirement for a new 
cipher is that it resists state-of-the-art cryptanalytic attacks. Preferably, this 
can be demonstrated in a rigorous, mathematical way. The second requirement 
is a good performance and an acceptable ‘cost’, in terms of CPU requirements, 
memory requirements, . . . 

The Wide trail strategy is an approach to design the round transformations of 
block ciphers that combine efficiency and resistance against differential and linear 
cryptanalysis. The strategy has been used in the design of Rijndael, the block 
cipher which has been selected to become the Advanced Encryption Standard 
(AES). In this article we describe the application of the strategy to the design 
of a certain type of block ciphers only, but the strategy can easily be extended 
to more general block cipher structures. Moreover, the wide trail strategy can 
also be applied to the design of synchronous stream ciphers and hash functions. 

In order to explain the wide trail strategy, we introduce our own notation 
for differential and linear cryptanalysis. We are convinced that a good notation 
helps to understand the reasonings, and our notation is suited very well to un- 
derstand the wide trail strategy. We introduce a general block cipher model and 
explain how linear correlations and difference propagation probabilities are built 
up in block ciphers designed according to this model. Subsequently, we explain 
the basic principles of the wide trail strategy and introduce our new diffusion 
measure, the branch number. We explain its relevance in providing bounds for 
the probability of differential trails and the correlation of linear trails over two 
rounds. We then introduce a cipher structure that combines efficiency with high 
resistance against linear and differential cryptanalysis. The resistance against 
linear and differential cryptanalysis is based on a theorem that lower bounds the 
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diffusion after four rounds of the cipher structure. In this paper, we emphasize 
the theoretical foundations of the wide trail design strategy. More explanation 
about the practical constructions can be found in [4] . 

In the following, the symbols + and ^ are used to denote bit-wise addition 
(XOR). The results can be generalized to other definitions for addition. 



2 A General Block Cipher Model 

We introduce a model for block ciphers that can be analyzed easily for their 
resistance against linear and differential cryptanalysis. 



2.1 Key- Alternating Block Ciphers 

A block cipher transforms plaintext blocks of a fixed length nb to ciphertext blocks 
of the same length under the influence of a key k. An iterative block cipher is 
defined as the application of a number of key-dependent Boolean permutations. 
The Boolean transformations are called the round transformations. Every ap- 
plication of a round transformation is called a round. We denote the number of 
rounds by r. We have: 

/3[fc] = O • • • O O . (1) 

In this expression, is called the i-th round of the block cipher and is called 
the i-th round key. For instance, the DES has 16 rounds. Every round uses the 
same round transformation, so we say there is only one round transformation. 
The round keys are computed from the cipher key. Usually, this is specified with 
an algorithm, called the key schedule. 

A key-alternating block cipher is an iterative block cipher with the following 
properties: 

~ Alternation: the cipher is defined as the alternated application of key-inde- 
pendent round transformations and the application of a round key. The first 
round key is applied before the first round and the last round key is applied 
after the last round. 

— Binary Key Addition: the round keys are applied by means of a simple XOR: 
to each bit of the intermediate state a round key bit is XORed. 

We have: 

/3[A:] = o o o • • • o a[k^^'^] o o a[k^°'^] . (2) 

As, hopefully, will become clear soon, key-alternating block ciphers lend them- 
selves very well to analysis with respect to the resistance against cryptanalysis. 
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2.2 The 7 A Round Structure 

In the wide trail strategy, the round transformations are composed of two in- 
vertible steps: 

— 7: a local non-linear transformation. By local, we mean that any output bit 
depends on only a limited number of input bits and that neighboring output 
bits depend on neighboring input bits. 

— A: a linear mixing transformation providing high diffusion. What is meant 
by high diffusion will be explained in the following sections. 

Hence we have a round transformation p\ 

p = Xoj. (3) 

and refer to this as a 7A round transformation. 

A typical construction for 7 is the so-called bricklayer mapping consisting 
of a number of invertible S-boxes. In this construction, the bits of input vector 
a are partitioned into nt m-bit bundles Ui € with z G I by the so-called 
bundle partition. I is called the index space. Clearly, the inverse of 7 consists 
of applying the inverse substitution boxes to the bundles. The block size of the 
cipher is given by rzb = mn*. In the case of the AES, the bundle size m is 8, 
hence bundles are bytes. This is by no means a necessity. For instance. Serpent 
[1] and Noekeon [5] also can be described in this framework, but have a bundle 
size of 4 bits. 3- WAY [3] uses 3-bit bundles. 

For the purpose of this analysis, the S-boxes of 7 need not to be specified. 
Since the use of different S-boxes for different bundles does not result in a plau- 
sible improvement of the resistance against known attacks, we propose to use 
the same S-box for all bundles. This allows to reduce the code size in software, 
and the required chip area in hardware implementations. 

The transformation A combines the bundles linearly: each bundle at the out- 
put is a linear function of bundles at the input. A can be specified at the bit level 
by a simple rZb x rZb binary matrix M . We have 

A : 6 = A(a) 6 = Ma (4) 

A can also be specified at the bundle level. For example, the bundles can be 
considered as elements in GF(2™) with respect to some basis. In its most general 
form, we have: 

X:b = X{a)^b, = Y, (5) 

j 0<£<m 

In most instances a more simple linear function is chosen that is a special case 
of (5): 

X : b = X{a) bi = CijOj (6) 

j 

Figure 1 gives a schematic representation of a 7A round transformation, followed 
by a key addition. 
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Fig. 1. Schematic representation of a 7 A round, followed by a key addition. 



3 Propagation in Key-Alternating Block Ciphers 

In the following subsections we describe the anatomy of correlations and differ- 
ence propagations in key-alternating block ciphers. This is used to determine 
the number of rounds required to provide resistance against linear and differ- 
ential cryptanalysis. We assume that the round transformations do not exhibit 
correlations with amplitude 1 or difference propagation with probability 1. The 
limitation to the key-alternating structure allows us to reason more easily about 
linear and differential trails as the effect of the key addition on the propagation 
is quite simple. 

3.1 Differential Cryptanalysis 

We assume that the reader has a basic understanding of the principles of dif- 
ferential cryptanalysis as explained in [2]. We give a very short overview and 
introduce our notation. 

Consider a couple of n-bit vectors a and a* with bitwise difference a-|-a* = a'. 
Let b = h{a),b* = h{a*) and b' = b + b* . The difference a' propagates to the 
difference b' through h. In general, b' is not fully determined by a' but depends 
on the value of a (or a*). 

Definition 1. A difference propagation probability P^{a',b') is defined as 

P'^(a',6') = 2"”^<5(6' + /i(a + a') + /i(a)) ■ (7) 

a 

Here 6{a) denotes the Kronecker delta function, which outputs zero, except when 
the input equals zero: 5(0) = 1. If a pair is chosen uniformly from the set of all 
pairs (a, a*) with a-|- a* = o', P^(a', b') is the probability that h{a) + h{a*) = b' . 

Let /3 be a Boolean mapping operating on n-bit vectors that is composed 
of r mappings: f3 = o o . . . o piA) o pA) , A differential trail A over a 

composed mapping consist of a sequence of r -I- 1 difference patterns: 






(8) 
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A differential trail has a prohahility. The probability of a differential trail is the 
number of values oq for which the difference patterns follow the differential trail 
divided by the number of possible values for oq. This differential trail is composed 
of r differential steps , that have a propagation probability: 

pp«(^b-i)^g(d) . (9) 

Differential cryptanalysis exploits difference propagations with large 

probabilities. The probability of difference propagation (a', b') is the sum of 
the probabilities of all r-round differential trails with initial difference a' and 
terminal difference b', i.e., 

P(a',6')= E • (10) 

qW —d/ _^q(r) —jjf 

3.2 Achieving Low Difference Propagation Probabilities 

For a successful classical differential cryptanalysis attack, the cryptanalyst needs 
to know an input difference pattern that propagates to an output difference 
pattern over all but a few (2 or 3) rounds of the cipher, with a probability that 
is significantly larger than To avoid this, we choose the number of rounds 

so that there are no differential trails with a probability above 2^“"'=. 

This strategy does not guarantee that there are no such difference propaga- 
tions with a high probability. Equation (10) shows that in principle, many trails 
with each a low probability may add up to a difference propagation with high 
probability. As a matter of fact, for any Boolean mapping, a difference pattern 
at the input must propagate to some difference pattern at the output, and the 
sum of the difference propagation probabilities over all possible output differ- 
ences is 1. Hence, there must be difference propagations with probability equal 
to or larger than 2^“”'’. This also applies to the Boolean mapping formed by 
a cipher for a given value of the cipher key. Hence, the presence of difference 
propagations with a high probability over any number of rounds of the cipher is 
a mathematical fact which can’t be avoided by design. 

Let us analyze a difference propagation with probability y for a given key 
value. A difference propagation probability y means that there are exactly 7/2”*’“^ 
pairs with the given input difference pattern and the given output difference 
pattern. Each of these pairs follows a particular differential trail. 

Assuming that the pairs are distributed over the trails according to a Poisson 
distribution, the expected number of pairs that, for a given key value, follow a 
differential trail with propagation probability 2“^, is Consider a dif- 

ferential trail with a propagation probability 2“^ smaller than 2^“"'’ that is 
followed by at least one pair. The probability that this trail is followed by more 
than one pair, is approximately It follows that if there are no differ- 

ential trails with a propagation probability above 2^“"*>, the 7/2"*>“^ pairs that 
have the correct input difference pattern and output difference pattern, follow 
almost 1 / 2 "'=“^ different differential trails. 
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If there are no differential trails with a low weight, difference propagations 
with a large probability are the result of multiple differential trails that happen 
to be followed by a pair in the given circumstances, i.e. for the given key value. 
For another key value, each of these individual differential trails may be fol- 
lowed by a pair, or not. This makes predicting the input difference patterns and 
output difference patterns that have large difference propagation probabilities 
practically infeasible. This is true if the key is known, and even more so if it is 
unknown. 

We conclude that restricting the probability of difference propagations is a 
sound design strategy. However, it doesn’t result in a proof of security. 



3.3 Linear Cryptanalysis 

We assume again that the reader is familiar with the basic principles of linear 
cryptanalysis [7]. However, instead of using the notions probability of a linear 
approximation, and deviation, we prefer to use our own formalism, based on 
correlation. 

Definition 2. The correlation C{f,g) between two Boolean functions f{a) and 
g{a) is defined as 



C(/,g) = 2-Prob(/(a) = 5 (a))-l . (11) 

It follows that C{f,g) = C{g,f). A parity of a Boolean vector is a Boolean 
function that consists of the XOR of a number of bits. A parity is determined 
by the positions of the bits of the Boolean vector that are included in the XOR. 
The selection pattern w of a parity is a Boolean vector value that has a 1 in 
the components that are included in the parity and a 0 in all other components. 
Analogous to the inner product of vectors in linear algebra, we express the parity 
of vector a corresponding with selection pattern w as w^a. In this expression the 
t suffix denotes transposition of the vector w. 

Note that for a vector a with n bits, there are 2” different parities. The set 
of parities of a Boolean vector is in fact the set of all linear Boolean functions of 
that vector. 

A linear trail U over a composed mapping consist of a sequence of r -|- 1 
selection patterns 



[/= . ( 12 ) 

This linear trail is composed of r linear steps that have a correla- 

tion: 

The correlation contribution Cp of a linear trail is the product of the correlation 
of all its steps: 



Cp(f') = 11 ^;™....-.. 



(13) 
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3.4 Achieving Low Correlation Amplitudes 

For a successful linear cryptanalysis attack, the cryptanalyst needs to know an 
input parity and an output parity after all but a few rounds of the cipher that 
have a correlation with an amplitude that is significantly larger than 2“”'’/^. To 
avoid this, we choose the number of rounds so that there are no linear trails with 
a correlation contribution above 

This does not guarantee that there are no high correlations over r rounds. 
From Parseval’s equality, it follows that for any output parity, the sum of the 
squared amplitudes of the correlations with all input parities is 1. In the as- 
sumption that the output parity is equally correlated to all 2”*> possible input 
parities, the correlation to each of these input parities has amplitude 2“"'’/^. 
In practice it is very unlikely that such a uniform distribution will be attained 
and correlations will exist that are orders of magnitude higher than This 

also applies to the Boolean mapping formed by a cipher for a given value of the 
cipher key. Hence, the presence of high correlations over (all but a few rounds 
of) the cipher is a mathematical fact that can’t be avoided by design. 

However, in the absence of local clustering of linear trails, high correlations 
can only occur as the result of ‘constructive interference’ of many linear trails 
that share the same initial and final selection patterns. Specifically, any such 
correlation with an amplitude above 2“”'=/^ must be the result of at least rik 
different linear trails. The condition that a linear trail in this set contributes 
constructively to the resulting correlation imposes a linear relation on the round 
key bits. From the point that more than nk linear trails are combined, it is very 
unlikely that all such conditions can be satisfied by choosing the appropriate 
cipher key value. 

The strong key-dependence of this interference makes it very unlikely that 
if a specific output parity has a high correlation with a specific input parity 
for a given key, that this will also be the case for another value of the key. In 
other words, although it follows from Parseval’s Theorem that high correlations 
over the cipher will exist whatever the number of rounds, the strong round 
key dependence of interference makes locating the input and output selection 
patterns for which high correlations occur practically infeasible. This is true if 
the key is known, and even more so if it is unknown. 

Again we conclude that restricting the amplitude of the correlation between 
input parities and output parities is a sound design strategy. However, it doesn’t 
result in a proof of security. 



3.5 Weight of a Trail 

7 is a bricklayer mapping consisting of S-boxes. It is easy to see that the corre- 
lation over 7 is the product of the correlations over the different S-box positions 
for the given input and output selection patterns. We define the weight of a cor- 
relation as the negative logarithm of its amplitude. The correlation weight for 
an input selection pattern and output selection pattern is the sum of the corre- 
lation weights of the different S-Box positions. If the output selection pattern is 
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non-zero for a particular S-box position or bundle, we call this S-box or bundle 
active. 

Similarly, the weight of the difference propagation over 7 is defined as the 
negative logarithm of its probability. The weight of the difference propagation 
over 7 is given by the sum of the weights of the difference propagations of 
the S-box positions for the given input difference pattern and output difference 
pattern. If the input difference pattern is non-zero for a particular S-box position 
or bundle, we call this S-box or bundle active. 

The correlation contribution of a linear trail is the product of the correlation 
of all its steps. The weight of such a trail is defined as the sum of the weights 
of its steps. As the weight of a step is the sum of the weights of its active S-box 
positions, the weight of a linear trail is the sum of that of its active S-boxes. An 
upper limit to the correlation is a lower limit to the weight per S-box. Hence, the 
weight of a linear trail is equal to or larger than the number of active bundles 
in all its selection patterns times the minimum (correlation) weight per S-box. 
We call the number of active bundles in a pattern or a trail its bundle weight. 

A differential trail is defined by a series of difference patterns. The weight 
of such a trail is the sum of the weights of the difference patterns of the trail. 
Completely analogous to linear trails, the weight of a differential trail is equal 
to or larger than the number of active S-boxes times the minimum (differential) 
weight per S-box. 



3.6 Wide Trails 

The reasoning above suggests two possible mechanisms to eliminate low-weight 
trails: 

1. Choose S-boxes with high minimum differential and correlation weight. 

2. Design the round transformation such a way that there are no relevant trails 
with low bundle weight. 

The maximum correlation amplitude of an m-bit invertible S-box is above 
2 m /2 an upper bound for the minimum (correlation) weight of n/2. The 

maximum difference propagation probability is at least 2 ™“^, yielding an upper 
bound for the minimum (differential) weight of m — 2. This seems to suggest 
that one should take large S-boxes. This is not the approach we follow in the 
wide trail design strategy. 

Instead of spending most of the resources on large S-boxes, the wide trail 
strategy aims at designing the round transformation(s) such that there are 
no trails with a low bundle weight. In ciphers designed by the wide trail 
strategy, a relatively large amount of resources is spent in the linear step to 
provide high multiple-round diffusion. 
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4 Diffusion 

Diffusion is the term introduced by C. Shannon to denote the quantitative 
spreading of information [9] . Diffusion is a rather vague concept the exact mean- 
ing of which strongly depends on the context in which it is used. We will explain 
now what we mean by diffusion in the context of the wide trail strategy. 

Inevitably, the mapping 7 provides some interaction between the different 
bits within the bundles that may be referred to as diffusion. However, it does 
not provide any inter-bundle interaction: difference propagation and correlation 
over 7 stays confined within the bundles. In the context of the wide trail strategy, 
it is not this kind of diffusion we are interested in. We use the term diffusion to 
indicate properties of a mapping that increase the minimum bundle weight of 
linear and differential trails. In this sense, all diffusion is realized by A. 7 does 
not provide any diffusion at all. 

For single-round trails, obviously the bundle weight of a single round trail, 
differential or linear, is equal to the number of active bundles at its input. It 
follows that the minimum bundle weight of a single-round trail is 1, independent 
of A. The situation becomes interesting as soon as we consider two-round trails. 

4.1 Branch Numbers and Two-Round Trails 

In two-round trails, the bundle weight is the sum of the number of active bundles 
in the (selection or difference) patterns at the beginning of the first and the input 
of the second round. We will see that the bundle weight of two-round trails can 
be expressed elegantly by using branch numbers. 

Consider a partition a that divides the different bit positions of a state into 
Ua sets called a-sets. An example of this is the partition that divides the bits 
in a number of bundles. The weight of a state value with respect to a partition 
a is equal to the number of a-sets that have at least one non-zero bit. This 
is denoted by Wa{a). If this is applied to a difference pattern a', Wa{a') is the 
number of active a-sets in a' . Applied to a selection pattern v, Wa{v) is the 
number of active a-sets in v. If a is the partition that forms the bundles, Wa{a) 
is the number of active bundles in the pattern a and is denoted by Wb{a). 

We make a distinction between the differential and the linear branch number 
of a transformation. 

Definition 3. The differential branch number of a transformation (j) with re- 
spect to a partition a is defined by 

Bd{4i, a) = min {wa{a © 6) -I- Wa{4>{a) © ())(6))} (14) 

a,b^a 

For a linear transformation A(a) © \{b) = \{a © b), and (14) reduces to: 

;Bd(A, a) = min{'u;a(a') + Wa(A(a'))} ■ (15) 

a'^O 

An upper bound for the differential branch number of a Boolean transformation 
(p with respect to a partition a is given by Uq, since the output difference corre- 
sponding to an input difference with a single non-zero bundle can have at most 
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weight riQ. Therefore, the differential branch number of (j) with respect to a is 
upper bounded by 

Bd{(j),a) <Ua + l. (16) 

Analogous to the differential branch number, we can define the linear branch 
number. 

Definition 4. The linear branch number of a transformation (j) with respect to 
a is given by 



min {wc.(a)+w„(/3)} (17) 

ct,p,c(a^a:,p^0(fc))7^O 

Many of the following discussions are valid both for differential and linear 
branch numbers and both Bd and Bi are denoted simply by B. Moreover, in 
many cases the partition is clear from the context and B{4>,a) is expressed as 
B{cf). 

4.2 Some Properties 

In general, the linear and differential branch number of a transformation with 
respect to a partition are not equal. From the symmetry of Definition 3 and 4 it 
follows that the branch number of a transformation and that of its inverse are 
the same. Moreover, we have the following properties: 

— a (differential or selection) pattern a is not affected by a key addition and 
hence its weight Wa{a) is not affected. This property holds independently of 
the partition a. 

— a bricklayer permutation compatible with a cannot turn an active a-subset 
into a non-active one or vice versa. Hence, it does not affect the weight 

Wa{a). 

Assume we have a transformation (f composed of a transformation (fi and a 
bricklayer transformation operating on a-subsets, i.e., (j> = As 4>a does 

not affect the number of active a-subsets in a propagation pattern, the branch 
number of <f and (fi are the same. More general, if propagation of patterns is 
analyzed at the level of a-subsets, bricklayer transformations compatible with a 
may be ignored. 

If we apply this to the bundle weight of a yA round transformation p, it 
follows immediately that the (linear or differential) bundle branch number of p 
is that of its linear part A. 



4.3 A Two-Round Propagation Theorem 

The following theorem relates the value of yB(A) to a bound on the number of 
active bundles in a trail. The proof is valid both for linear and differential trails: 
in the case of linear trails B stands for Bi and in the case of differential trails B 
stands for Bd- 
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Theorem 1 (Two- Round Propagation Theorem). 

For a key- alternating block cipher with a 7 A round structure the number of active 
bundles of any two-round trail is lower bounded by the (bundle) branch number 
ofX. 

Proof. Figure 2 depicts two rounds. Since the transformations 7 and a[k] operate 
on each bundle individually, they do not affect the propagation of patterns. Hence 
it follows that is only bounded by the properties of the linear 

transformation A of the first round. Definition 3 and 4 imply that the sum of the 
active bundles before and after A of the first round is lower bounded by B{\). 

□ 



II I I I I I I I I a(i) 

□□□□□□□□□ ^ 

I I A 

□□□□□□□□□ 

I I I I I I I I I a(2) 

□□□□□□□□□ ^ 

I I A 

□□□□□□□□□ 

Fig. 2 . Transformations relevant in the proof of Theorem 1. 



5 An Efficient Key-Alternating Structnre 

In trails of more than two rounds, the desired diffusion properties of p are less 
trivial. It is clear than any 2n-round trail can be decomposed in n 2-round trails 
and hence that its bundle weight is lower bounded by n times the branch num- 
ber of p. The ‘greedy’ approach to eliminate low-weight trails is to consider 
Theorem 1 only and to design a round transformation with a maximum branch 
number. However, transformations that provide high branch numbers have a ten- 
dency to have a high implementation cost. More efficient designs can be achieved 
in the following way. We build a key-alternating block cipher that consists of an 
alternation of two different round transformations defined by: 

= 0 o 7 (18) 

p** = 0 o 7 (19) 

The transformation 7 is defined as before and operates on Ub m-bit bundles. 
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5.1 The Diffusion Transformation 0 

With respect to 9, the bundles of the state are grouped into a number of columns 
by a partition S of the index space I. We denote a column by ^ and the number 
of columns by ns- The column containing an index i is denoted by ^(i) and the 
number of indices in a column ^ by n^. The size of the columns relates to the 
block length by 

m = mnt ■ 

0 is a bricklayer mapping with component mappings that each operate on a 
column. Within each column, bundles are linearly combined. We have 

9 : b = 9{a) bi = CijUj (20) 

The bricklayer transformation 9 only needs to realize diffusion within the 
columns and has hence an implementation cost that is much lower. 




a 

9 

b 



Fig. 3. The diffusion transformation 6 



Similar to active bundles, we can speak of active columns. The number of 
active columns of a propagation pattern a is denoted by Ws (a) . The round trans- 
formation = 0 o 7 is a bricklayer transformation operating independently on a 
number of columns. Taking this bricklayer structure into account we can extend 
the result of Section 4.1 slightly. The branch number of 9 is given by the min- 
imum branch number of its component transformations. Applying (16) to the 
component mappings defined by the matrices results in the following upper 
bound: 

B{9) < mmnj -h 1. (21) 

Hence, the smallest column imposes the upper limit for the branch number. The 
Two-Round Propagation Theorem (Theorem 1) implies the following Lemma. 
Lemma 1. The bundle weight of any two-round trail in which the first round has 
a 70 round transformation is lower hounded by NB{9), where N is the number 
of active columns at the input of the second round. 

Proof. Theorem 1 can be applied to each of the component mappings of the 
bricklayer mapping separately. For each active column there are at least B{9) 
active bundles in the two-round trail. □ 
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5.2 The Linear Transformation & 

0 mixes bundles across columns. 

0 : b = 0{a) bi = CijQj (22) 

i 

The goal of 0 is to provide inter-column diffusion. The design criterion for 0 is 
to have a high branch number with respect to S’. This is denoted by B{0, S) 
and called its column branch number. 

5.3 A Lower Bound on the Bundle Weight of 4-Round Trails 

The combination of the bundle branch number of 0 and the column branch 
number of 0 allows us to prove a lower bound on the bundle weight of any trail 
over 4 rounds starting with p“. 

Theorem 2 (Four- round Propagation Theorem for 90 construction). 

For a key- alternating block cipher with round transformations as defined in (18) 
and (19), the bundle weight of any trail over 

o p°- o o p°- 

is lower hounded by B{6) x B{0,Ei). 

Proof. Figure 4 depicts four rounds. As the key additions play no role in the 
propagation of patterns, they have been left out. It is easy to see that the linear 
transformation of the fourth round plays no role. The sum of the number of active 



a(i) 

e 

a(2) 

0 

a(3) 

e 

a(4) 

0 



Fig. 4. Relevant transformations for the proof of Theorem 2. 



columns in and is lower bounded by B{0,S). According to Lemma 1, 
for each active column in there are at least B{9) active bundles in the 
corresponding columns of and Similarly, for each active column in 
there are at least B{9) active bundles in the corresponding columns of 
and Hence the total number of active bundles is lower bounded by B(9) x 
B{0,S). □ 
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5.4 An Efficient Construction for 0 

As opposed to 9, 0 does not operate on different columns independently and 
hence may have a much higher implementation cost. In this we present a con- 
struction of 0 in terms of 9 and bundle transpositions denoted by tt. We define 

0 = TT O 9 O TT . (23) 

In the following we will define tt and prove that if tt is well chosen the column 
branch number of 0 can be made equal to the bundle branch number of 9. 

The bundle transposition tt. The bundle transposition tt is defined as 

7T : 6 = 7r(a) = Op(i) , (24) 

with p(i) a permutation of the index space X. The inverse of tt is defined by 
p~^{i). Observe that a bundle transposition tt does not affect the bundle weight 
of a propagation pattern and hence that the branch number of a transformation 
is not affected if it is composed with tt. 

Contrary to 9, tt provides inter- column diffusion. Intuitively, good diffusion 
for TT would mean that it distributes the different bundles of a column to as 
many different columns as possible. We say tt is diffusion-optimal if the different 
bundles in each column are distributed over all different columns. More formally, 
we have: 

Definition 5. tt is diffusion-optimal if and only if 

Vz, j el,iffj: {f{i) = f{j)) i^{p{i)) f{p{j)))- (25) 

It is easy to see that this implies the same condition for 7r“^. A diffusion-optimal 
bundle transposition tt implies 

Ws(Tr(a)) > max(wb(a^)) . 

Therefore a diffusion-optimal transformation can only exist if ns > maxi(n^.). 
In words, tt can only be diffusion-optimal if there are at least as many columns 
as there are bundles in the largest column. If tt is diffusion-optimal, we can 
prove that the column branch number of the mapping 0 is lower bounded by 
the branch number of 9. 

Lemma 2. If tt is a diffusion- optimal transposition of bundles, the column 
branch number of tt o <f> o tt is lower bounded by the bundle branch number of 

4> 

Proof. We refer to Figure 5 for the notations used in this proof. We have to 
demonstrate that 

Ws{a) -\- Ws{d) > B{4>) . 

For any active column in b, the number of active bundles in that column and 
the corresponding column of c is at least B{4>). tt moves all active bundles in an 
active column of c to different columns in d and tt~^ moves all active bundles in 
an active column of b to different columns in a. It follows that the sum of the 
number of active columns in a and in d is lower bounded by the bundle branch 
number of </>. □ 
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a 

71 

b 

0 

C 

7T 

d 



Fig. 5. Transformations relevant in the proof of Lemma 2. 
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6 Using Identical Ronnd Transformations 

The efficient structure described in Section 5 uses two different round transfor- 
mations. It is possible to define a block cipher structure with only one round 
transformation, that achieves the same bound. This is the round structure used 
in the AES and related ciphers. The advantage of having a single round trans- 
formation is a reduction in software code in software implementations and chip 
area in dedicated hardware implementations. For this purpose, A is composed of 
two types of the mappings: 

— 9: the linear bricklayer mapping that provides high local diffusion, as defined 
in Section 5.1, and 

— 7t: the transposition mapping that provides high dispersion, as defined in 
Section 5.4. 

Hence we have for the round transformation: 

= 6* o 7T o 7 (26) 

Figure 6 gives a schematic representation of the different transformations of a 
round. These component transformation are defined in such a way that they 
impose strict lower bounds on the number of active S-boxes in four-round trails. 
For two-round trails it can be seen that the number of active bundles is lower 
bounded by B{p'^) = B{\) = B{6). For four rounds, we can prove the following 
important theorem: 

Theorem 3 (Four- Round Propagation Theorem). 

For a key-iterated block cipher with a jtt 9 round transformation and diffusion- 
optimal 7T, the number of active S-boxes in a four-round trail is lower bounded 
by {B{9)f. 

Proof. Firstly, we show that the transformation formed by 4 applications of the 
round transformation as defined in (26) is equivalent to four rounds of the 
construction with p“ and as defined in (18) and (19). For simplicity, we leave 
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□□□□□□□□□ 7 



□□□□□□□□□ 

I I I I I I I 1 



Fig. 6. Schematic representation of the different steps of a ') ti 9 round transformation, 
followed by a key addition. 



out the applications of the key additions, but the proof works in the same way 
if the key additions are present. Let A be defined as: 

A = o p‘^ o p‘^ o 

= (0 O 7T o 7 ) O (0 O 7 T o y) O (0 O 7 T o y) O (0 O 7 T o y) . 



y is a bricklayer mapping, operating on every bundle separately and operating 
independently of the bundle’s position. Therefore y commutes with tt, which 
only moves the bundles to different positions. We get: 



= (0 o y) o (tt o 0 o TT o y) o (0 o y) o (tt o 0 o 7T o y) 



= p°“ o p^ o p°- o p^ 



with O of p^ defined exactly as in (23). Now we can apply Lemma 2 and Theo- 
rem 2 to finish the proof. □ 



7 Conclusion and Open Problems 

We have shown how the application of the wide trail design strategy leads to the 
definition of a round transformation as the one used in Rijndael. The proposed 
round transformation allows us to give provable bounds on the correlation of 
linear trails and the weight of differential trails while at the same time allowing 
efficient implementations. 

An interesting open problem is the effect of trail clustering. Theorems 1, 2 
and 3 give lower bounds on the weight of trails. As mentioned in Section 3, 
the probability of input-output difference propagations as well as the correla- 
tion between input parities and output parities are a sum over many trails. If 
the trails follow indeed a Poisson distribution, then the results can be applied 
straightforwardly. However, it has already been observed that in some cases, the 
trails don’t follow a Poisson distribution. Instead, they tend to cluster and as 
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a result the probability of a difference propagation can be significantly higher 
[6]. A similar effect for correlations has been studied in [8]. It remains an open 
problem whether trail clustering occurs and impacts the security for the cipher 
structure described here. 
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Abstract. A major problem of mobile agents is their inability to au- 
thenticate transactions in a hostile environment. Users will not wish to 
equip agents with their private signature keys when the agents may ex- 
ecute on untrusted platforms. Undetachable signatures were introduced 
to solve this problem by allowing users to equip agents with the means 
to sign signatures for tightly constrained transactions, using informa- 
tion especially derived from the user private signature key. However, 
the problem remains that a platform can force an agent to commit 
to a sub-optimal transaction. In parallel with the work on undetach- 
able signatures, much work has been performed on threshold signature 
schemes, which allow signing power to be distributed across multiple 
agents, thereby reducing the trust in a single entity. We combine these 
notions and introduce the concept of an undetachable threshold signa- 
ture scheme, which enables constrained signing power to be distributed 
across multiple agents, thus reducing the necessary trust in single agent 
platforms. We also provide an RSA-based example of such a scheme 
based on a combination of Shoup’s threshold signature scheme, [1] and 
Kotzanikolaou et al’s undetachable signature scheme, [2]. 



1 Introduction 

A digital signature is the electronic counterpart to a written signature. Thus one 
way to commit to an electronic transaction is by the use of a digital signature. 
Recently, the use of mobile agents to commit to transactions for a user has 
become a topic of interest. Mobile agents, however, face the problem of having 
to execute in a hostile environment where the host executing the agent has access 
to all the data that an agent has stored (for instance the private signature key) . 
Consequently, the problem of allowing an agent to sign a transaction on behalf 
of a user is one of interest. 

Undetachable signatures were first proposed by Sander and Tschudin [3] to 
solve this problem, and are based on the idea of computing with encrypted func- 
tions. The host executes a function so/, where / is an encrypting function, with- 
out having access to the user’s private signature function s. The security of the 
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bile & Personal Communications, Mobile VCE, www.mobilevce.com, whose funding 
support, including that of the EPSRC, is gratefully acknowledged. More detailed 
technical reports on this research are available to Industrial Members of Mobile 
VCE. 



B. Honary (Ed.): Cryptography and Coding 2001, LNCS 2260, pp. 239-244, 2001. 
© Springer- Verlag Berlin Heidelberg 2001 




240 



N. Borselius, C.J. Mitchell, and A. Wilson 



method lies in the encrypting function /. Whilst Sander and Tschudin were un- 
able to propose a satisfactory scheme, more recently Kotzanikolaou, Burmester 
and Chrissikopoulos [2] have presented an RSA-based scheme which appears to 
be secure. 

The idea of an undetachable signature is as follows. Suppose a user wishes 
to purchase a product from an electronic shop. The agent can commit to the 
transaction only if the agent can use the signature function s of the user. However 
as the server where the agent executes may be hostile, the signature is protected 
by a function / to obtain g = so/. The user then gives the agent the pair 
(/, g) of functions as part of its code. The server then executes the pair (/, g) 
on an input x (where x encodes transaction details) to obtain the undetachable 
signature pair 



f{x) = m and g{x) = s(rn). 

The pair of functions allows the agent to create signatures for the user whilst 
executing on the server without revealing s to the server. The parameters of the 
function / are such that the output of / includes the user’s constraints. Thus 
m links the constraints of the customer to the bid of the server. This is then 
certified by the signature on this message. The main point is that the server 
cannot sign arbitrary messages, because the function / is linked to the user’s 
constraints. 

However, one problem with this approach is that the agent is still given the 
power to sign any transaction it likes, subject to the requirement that the trans- 
action must be consistent with the constraints used to construct /. Thus, for 
example, whilst the constraints may limit the nature and/or value of a transac- 
tion, a malicious host may force an agent to commit to a transaction much less 
favourable than could be achieved. 

Thus, to protect further against malicious hosts, a user may wish to use 
more than one agent and have the agents agree on a bid before committing to it. 
Hence, a user may send out n agents with the criteria that k of them must agree 
before committing to a purchase. The obvious solution to such a requirement 
is to employ a threshold signature scheme, meaning that agents can all sign the 
bid they think ‘best’ given the user’s requirements, and then, on receipt of a 
sufficient number of these bids, the user’s signature can be reconstructed. 

However, such a scheme does not possess the means to constrain the power 
given to a quorum of agents. This motivates the introduction of the concept of 
an undetachable threshold signature which both distributes signature authority 
across multiple agents and simultaneously constrains the signatures that may be 
constructed. 

The rest of the paper is as follows. In Section 2 we outline the undetachable 
signature scheme of [2], and in Section 3 we briefly review threshold signatures 
and give a method of Shoup [1] to construct such a scheme. Finally, in Section 
4 we define the concept of an undetachable threshold signature, and show how 
an example of such a scheme may be obtained by combining the schemes of [2] 
and [1]. 
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2 RSA Undetachable Signatures 

We briefly present the RSA undetachable signature scheme given in [2] . The user 
sets up an RSA signature pair in the usual manner, that is the user selects an 
RSA modulus n which is the product of two primes p and q, and a number e 
such that 1 < e < (j){n) = {p— 1)(<Z— 1) and gcd(e, ^(n)) = 1. Let d be such that 
1 < d < 4>(n) and ed = 1 mod <f)(n). The user then publishes the verification key 
(n, e) and keeps d as the private signing key. 

Let I be an identifier for the user and R the encoded requirements of the user 
for a purchase (we assume that R is encoded in a manner which is understood 
by all parties). Let h be an appropriate hash-function {i.e. one giving a value in 
Z„). The user then forms H = h{I,R). 

The user then gives an agent the user identifier, the requirements, and the 
pair (H,G) as its undetachable signature, where G = H‘^ mod n. To sign a bid 
B (which we assume is in the same format as R), the executing host calculates 
X = h{B). The undetachable signature is then the pair (id®, G®). We note that, 

^dx ^xd 

so that the server has signed the value ii® with the user’s private key. 

We briefly note that this scheme appears secure, and a proof of this fact 
is given in [2]. To forge a signature on a different set of requirements R' a 
malicious host would need to forge H' = h{I, R'), G' = {H'Y and (G')®. Clearly 
the only work needed here is to forge G' , and this would require knowledge of a 
user’s private key. Having said this, there is nothing in this scheme to prevent 
a host from signing more than one bid, or presenting a bid that just meets the 
requirements of the user (as opposed to a possibly better standard offer). 



3 Threshold Signatures 

The idea of a threshold scheme is to take a secret, and divide it into pieces called 
shares which are distributed among a group of entities. Then any subset of these 
entities of a given size can reconstruct this secret, but a smaller group can learn 
no information about the secret. An example of such a scheme is given in [4]. 

Threshold cryptography was first proposed by Desmedt [5]. One important 
type of threshold cryptosystem is known as a threshold signature. In such a 
scheme, any set of k parties from a total of I parties can sign any document, 
and any coalition of less than k parties cannot sign a document. Such schemes 
tend to rely on a combiner which is not necessarily trusted. Several schemes 
have been proposed based on both El Gamal and RSA cryptography (see, for 
example, [1] for a short survey). Recently Shoup [1] proposed an RSA scheme 
which is as efficient as possible; the scheme uses only one level of secret sharing, 
each server sends a single part signature to a combiner, and must do work that 
is equivalent, up to a constant factor, to computing a single RSA signature. 

Although in some sense not perfect as a threshold signature scheme (as it 
relies on a trusted party to form the shares) this scheme is ideal in our setting. 
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where the user dispatching the agent will always (one would hope) trust them- 
selves. (Note that an alternative scheme without a trusted dealer is given in [6]. 
This scheme also improves on [1] by not relying on an RSA modulus made up 
of ‘safe primes’). An example of an El Gamal based scheme is given in [7]. 

We next briefly outline the threshold signature scheme of [1]. 

The user (dealer) forms the following: 

— An RSA modulus n = pq where p = 2p' + 1 and q = 2q' + 1 are safe primes, 
i.e. p', q' are prime. 

— A public exponent e where e is prime and a private key d, where de = 1 
(mod p'q')- 

— A polynomial f{x) = where ao = d and a* G {0, . . . ,p'q' — 1} 

(selected at random) for 1 < z < A:. 

— L{n), the bit length of n, and Li, a secondary security parameter — Shoup 
[1] suggests Li = 128. 

— The I signature key shares of the scheme Si, where each Si is selected at 

random from the set {s|0 < s < s = f{i) mod {p'q')}. 

— The verification keys VK = v and VKj = u®* where v G Q„, the subgroup of 
squares of Z* . 

— A global hash function h mapping into Z*. 

~ A second hash function g whose output is an Li-bit integer. 

In this scheme a shareholder signs a message m in the following manner. 
Firstly the shareholder calculates the hash of the message, i.e. x = h{m). The 
signature share of a shareholder i then consists of 



and a ‘proof of correctness’ (note that A = /!). The proof of correctness is 
basically just a proof that the discrete logarithm of xj to the base is the 
same as the discrete logarithm of Vi to the base v. Let L{n) be the bit length of 
n. The shareholder then chooses a random number r G {0, . . . , — 1} 

and computes 

v' = v'’, x' = x'^^r, c = g{v, Vi, v', x'), z = SiC + r. 

The proof of correctness is then {z, c) which can be verified by calculating 

/ 2 z —c —2c\ 

c = g{v,x ,v^,x,,v ,x Xi ). 



To combine the shares the combiner acts as follows. Assume we have valid 
shares from a set S = {ii, ■ ■ ■ , *fc} of shareholders. The combiner computes 



-'w=^ n 






i 

(* - j) ' 



These values are derived from the standard Lagrange interpolation formula. 
These values are integers and it is clear that they are easy to compute. We also 
have, from the Lagrange interpolation formula that, 

^ • /(O) = ^o,i/(j) mod {p'q'). 

jes 
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In other words we have, 



d- A = Y, AojSj 

The combiner then computes, 



= 

To check this signature we note that tu® = where gcd(e, 4Z\®) = 1. As e 
is coprime to 4A® we can find a, b such that a(4A®) + 6e = 1 so that we finally 
have the signature 



y® = (rc“x*')® = X. 

4 Undetachable Threshold Signatures 

We now introduce the notion of an undetachable threshold signature. Suppose 
a user has a private signature key s and a public verification key v. Suppose 
also that the user has a ‘constraint string’ H, which will define what types of 
signature can be created. Then an undetachable threshold signature scheme will 
enable the user to provide n entities with ‘shares’ of the private signature key 
(where the shares will be a function of R), where the following properties must 
be satisfied: 

— each entity can use their share to sign a message M of their choice to obtain 
a ‘signature share’; 

— the ‘correctness’ of a signature share can be verified independently of any 
other signature shares; 

— any entity, when equipped with k different signature shares for the same 
message M, can construct a signature on the message M which will be 
verifiable by any party with a trusted copy of the public key of the user, and 
which will also enable the string R to be verified; 

~ knowledge of less than k different signature shares for the same message M 
cannot be used to construct a valid signature on the message M; 

— knowledge of any number of different signature shares for messages other 
than M will not enable the construction of a valid signature on message M; 

— knowledge of any number of different signature shares for constraints strings 
other than R will not enable the construction of a valid signature with as- 
sociated constraint string R. 

As discussed above, the motivation for introducing this concept is that the 
use of a threshold signature scheme or a detachable signature scheme on its own 
would not protect against all possible attacks in a mobile agent scenario. We now 
describe an example of such a scheme. For brevity, we only give the necessary 
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changes to the threshold scheme in section 3 to form the undetachable threshold 
signature scheme. 

Recall that the secret share for shareholder i consists of a number Sj. Let h 
be an appropriate hash function. The signature share of this shareholder for a 
message m is then 



where I is the total number of shares, A = ll and x = h{m) is a hash of the 
message. 

As in Section 2 let I be the identifier of a user and let R be the user re- 
quirements. Let H = h{I,R) be a hash of the requirements. We replace the 
share Si with a pair {H,ti = To sign a bid B the shareholder calculates 

C = h{B) and 

^^2-A‘Si'^C jj2-A-SiC ^jjC'j2-A-Si 

Thus, when all the shares are combined the combiner will have a signed copy of 
H'^ , thus achieving a signed undetachable signature. 

We observe that a proof of security is given for the scheme in Section 3 
provided that k is one greater than the number of corrupt servers (in the case 
where k exceeds the number of corrupt servers by a greater number a slightly 
adapted scheme is used) . With this information to hand we note that this scheme 
is secure as long as the undetachable scheme given in [2] is secure, and that this 
scheme appears to be sound. 
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Abstract. Divide and conquer attacks try to recover small portions of 
cryptographic keys one by one. Usually, a wrong guess makes subsequent 
ones useless. Hence possible errors should be detected and corrected 
as soon as possible. In this paper we introduce a new (generic) error 
detection and correction strategy. Its efficiency is demonstrated at 
various examples, namely at a power attack, two timing attacks against 
RSA implementations with and without Chinese Remainder Theorem, 
and a timing attack against the future AES (Rijndael). As the design of 
efficient countermeasures requires a good understanding of an attack’s 
actual power, the possible improvement induced by sophisticated error 
detection and correction should not be neglected. Although divide and 
conquer attacks are typical for side-channel attacks, we would like to 
stress that they are not restricted to that field, as will be illustrated by 
Siegenthaler’s attack. 

Keywords: Error detection, error correction, timing attack, power at- 
tack. 



1 Introduction 

Cryptographic algorithms (encryption schemes, random generators, . . . ) often 
gather their security on one (or a few) secret parameter(s), whereas the rest of 
the design is left public. 

To seize this secret parameter (usually denoted as the key), a frequent attack 
scenario assumes that the pirate is able to observe the output of the algorithm, 
possibly with access - or even control - of this algorithm’s input during a limited 
period of time. The attacker will use these observations to deduce information 
on the key. 
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In particular, divide and conquer attacks basically consist in dividing the 
key into smaller pieces, whose size makes exhaustive search possible, and then 
handling these pieces separately. Such attacks are very efficient, but will of course 
only be feasible provided that it is possible to guess parts of the key separately. 
In other words, it must be possible, with reasonable probability, to confirm or 
invalidate a partial key guess without knowing the other parts of the key. 

In many cases, a wrong partial key guess makes subsequent ones worthless. 
Consequently, it is desirable to be able to validate the guesses made so far. This 
suggests the following iterated process: 

1. guess a part of the key 

2. check whether guess is correct so far (error detection); 

3. if yes, repeat step 1 with next part of the key; 

4. otherwise, identify probable error position(s) among previously guessed parts 
(error location), and go back to step 1 trying another guess for that part 
(error correction). 

Many different attacks can be put in the divide and conquer class, and the rel- 
ative importance of the different parts (guess, error detection, ... ) may vary 
greatly from one to another. However, these parts are usually more or less in- 
dependent, and can therefore be subject to independent improvements, any of 
which would result in a global performance improvement. 

This paper focuses on the error management ^ part. Error management faces 
several efficiency constraints. First, it must keep sample size small: a simple 
error check, for example, consists in repeating Step 1 with a new observation; 
however, this reduces the efficiency of the attack by at least a factor 2, which 
is not acceptable for many realistic scenarios. Second, it must be time-efficient, 
in the sense that a wrong guess must be identified as quickly as possible; but 
this must be counterbalanced with the risk of hindering the attack’s success: 
a too restrictive strategy involves the risk of definitively rejecting a guess that 
was correct, with the consequence that the attack will fail completely, whereas a 
too permissive strategy will make the attack longer, possibly up to impractical 
running time. 

In this paper we do not concentrate on the attacks themselves, i.e. on strate- 
gies to guess parts of the key (Step 1) and on countermeasures which prevent 
these attacks but on error detection and error correction strategies (Steps 2 and 
4). In particular, we mainly suggest a new type of error detection strategy which 
may be characterized as a three-option decision strategy. Everytime the error 
detection is applied, it can either: 

2. a. Validate the correctness of previous partial key guesses up to a certain point 
(which need not necessarily be the current one). This yields a “confirmed 
index” (i.e. we definitively assume that that part of the key is correct) 
which will facilitate later error locations/corrections, as only the partial 

^ The idea that errors could be identified and corrected in a timing attack was first 
mentioned by Kocher [7]. 
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key guesses which were derived after the actual confirmed index will be 
later considered. 

2. b. Conclude an error has occurred between this point and the last confirmed 

index. In this case, we enter the error-correction phase. 

2.C. If there are no convincing indicators for any of these two cases, do not 
conclude. In this case, we simply continue the attack, and postpone decision 
to a future error detection step. 

This concept is rather generic and can be adapted to various types of attacks. 
We will consider five examples which may seem to be very different at first sight. 

Section 2 briefly introduces the subject by presenting an extreme example of 
divide and conquer attack, in which the error correction and detection strategy 
is so efficient that it constitutes the most important part of the attack. Section 

3, which is the main part of this paper, then develops the error management 
strategy in detail, on a timing attack example. In this example, the sophisti- 
cated error detection and correction strategy, together with an optimized bit 
estimation strategy, allowed to improve the efficiency of a timing attack on RSA 
([6]) by about factor 50. Section 4 shows how a similar technique can be applied 
to a timing attack against another RSA implementation (using the Chinese Re- 
mainder Theorem (CRT) and thus being resistant against the attacks considered 
in [6] and [13]). Section 5 treats of a timing attack on a careless AES (Rijndael) 
implementation (cf. [8]) which is atypical as the key parts principally can be 
guessed independently and errors do not influence the later guesses. However, 
also in this case an efficient error location strategy is evaluated. We would like to 
insist on the fact that the proposed error management strategy is in fact much 
more general, and can be applied to many other divide and conquer attacks. 
Although the examples developed here correspond to physical attacks (timing 
attack, power analysis), divide and conquer attacks may also appear in algorith- 
mically based attacks. This will be sketched in Sect. 6. 

We point out that this paper was not written as a “manual” for potential 
attackers but shall help designers to assess the realism and the true threat of 
certain divide and conquer attacks. 



2 Messerges et al.’s Power Analysis 

Messerges et al.’s Multiple Exponent, Single Data (MESD) attack ([10]) cor- 
responds to some extreme form of divide and conquer attack. Although this 
attack’s efficiency leaves very little room for improvement through a new error- 
detection policy, we believe that, due to its simplicity, it constitutes a good 
example to start with. The reader may consider it as a witness of how much an 
adequate error-detection policy can improve an attack’s performances. 

The context is that of a smart card accepting to exponentiate a constant 
value with user-supplied exponents^. The attacker is able to measure the power 

^ We will not discuss this attack’s realism here; see [10] for a brief discussion on the 
subject. 
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consumption of that card, and tries to guess the secret exponent used by that 
card. The attack could be described as follows. Assume the attacker already 
knows the most significant t bits • ■ • > of the key: 

Guessing phase attacker builds an exponent e = ■ ■ ■ tq, 

where bits are chosen at random, and submits this exponent to the card; 
Error detection/location attacker compares the power consumption of ex- 
ponentiation with e to the power consumption of exponentiation with the 
secret key k, and notes the position s of the first bit after which the two 
curves differ (due to measurement errors, this may require averaging on sev- 
eral exponentiations with same input); 

Error correction attacker builds a new exponent e' = k^-i ■ ■ ■ ksks-i 
r'g _2 ■ ■ -Tq, where ki denotes 1 — ki and the r' are chosen at random. 

In this case, the error location method is so efficient, allowing to point the 
error position almost exactly, that the guessing process can be limited to its 
simplest form (random guess). Similarly, this precision in error location allows 
a very simple error correction (bit inversion) . 



3 Timing Attack against RSA without CRT 

In this section we describe a timing attack against RSA signature. The attack 
([6]) was initially developed against a preliminary version of the Cascade ([2]) 
smart card, although it would work equally well against many other modular 
exponentiation algorithms without CRT (see [6,13]). Our approach - the optimal 
decision strategy derived in [13] combined with a very efficient error detection 
and correction strategy - increases the efficiency of the attack by about factor 
50. As this attack constitutes a systematic approach to exploit side-channel 
information in an optimal way, we will describe the attack and the development 
of our error management policy in detail. 

Remark 1. (i) The attack presented here is a pure timing attack, in the sense 
that the only information we dispose of is a set of messages and, for each of 
them, the total time required for signature; 

(ii) in view of [6] , the final version of the Cascade cryptographic library was later 
modified to resist against timing attacks ([5]). 



3.1 Definitions and Mathematical Background 

To compute y‘^(mod M) the Cascade chip uses the simple square and multiply 
algorithm. Modular multiplications are carried out with Montgomery’s algo- 
rithm ([11]). In its simplest variant i? := 2“ > M where w fits to the device’s 
hardware architecture. Let R~^ G Zm '■= {0,...,M — 1} denote the multi- 
plicative inverse of R in Zm, i.e. RR~^ = 1 (mod M). The integer M' G Z^ 
satisfies the integer equation RR~^ — MM' = 1. To simplify our notation we 
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introduce the functions Z — >• Zm defined by := xR{modM) and 

'R*{x) := xR~^{mod M). For a' = 'F(a) and b' = iF(6) Montgomery’s algorithm 
returns s := 'F*(!F(a)'F(6)) = 'R{ab). 



Montgomery’s algorithm 

2 ; := a'b'\ 

r := ( 2 (mod_R))M' (mod R) 

z + rM 

^ 

if (s > M) then s := s — M 
return s (= = a'b'R~^) 



Algorithm 1 (square and multiply 
using Montgomery’s algorithm) 
temp 

for i=w-2 downto 0 do { 
temp ^«(temp^); 
if {di = 1) temp := #"*(temp* 



} 

return <f"*(temp); 

The secret exponent d has binary representation (dw-idw -2 ■ ■ ■ do )2 where 
du,_i denotes its most significant bit. Further, ]iam{d) denotes the Hamming 
weight of d. The subtraction s := s—M in Montgomery’s algorithm is called extra 
reduction while y ^ W{y) and temp 1 — >■ !F*(temp) are the pre-multiplication and 
the post-multiplication. For any o', 6' G Zm we have Time(<F*(a', 6')) = c if no 
extra reduction is necessary and = c + Cer else. The sources of our timing attack 
are time differences which are caused by different numbers of extra reductions 
within the for-loop of Algorithm 1. 

Remark 2. Many implementations (among which Cascade) use a more efficient 
multiprecision variant of Montgomery’s algorithm (see e.g. [9], Algorithm 14.36) 
than the one listed above. This influences the absolute value of the constants c 
and Cer but not the fact whether an extra reduction is necessary ([14], Remark 
1). We hence clearly analyze the simplest variant of Montgomery’s algorithm 
described above. 



Let t := Time{y‘^{mod M)) . For a sample 7 /( 1 ), . . . ,y(N) G Zm the attacker 
measures t(i) := t(i) +tErr(i)) ■ • ■ j f(Af) t{N) +t^rr{N) where tErr(j) denotes the 
measurement error. More precisely, 



t{j) = + ts(j) + (w + ham{d) - 2)c + r^) Cer (1) 

where ts(j) denotes the time needed for set-up operations such as input, 
output, increasing the loop variable, evaluating the if-statements and, above 
all, the pre- and post-multiplication. Finally, denotes the number of extra 
reductions needed within the for-loop of Algorithm 1, i.e. ’ ’ 

where 'iCi(j) = 1 if the i*^ Montgomery multiplication in Algorithm 1 requires 
an extra reduction for basis y(^j) while = 0 else. From (1) we derive the 

“discretized running time” 



^dU) 



ti^j'j — ts{j) — {w d- ham{d) — 2)c 
Cer 



(2) 



If fErr(j) = 0 (i-C. for exact time measurement), equals r(j), the total 
number of extra reductions needed within the for-loop of Algorithm 1. The val- 
ues Wi(j),W 2 (j), ■ ■ ■ can be interpreted as realizations (i.e. values assumed by) of 
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a particular non-stationary sequence of random variables bFi(j), ■ • • which 

are closely related with the Montgomery multiplications within Algorithm 1 (cf. 
[13], Sect. 6). (Numerous empirical experiments confirmed perfectly the suitabil- 
ity of this mathematical model.) In particular, the definition of Ibj(j) explicitly 
depends on the basis hq) and whether the Montgomery multiplication within 
the for-loop in Algorithm 1 is a squaring (shortly: type{i) — Q') or a multipli- 
cation with if'(j/(j)) (shortly: type{i) — M'), resp. To derive an optimal decision 
strategy, the sequence Wi(jpW 2 (j), ■ ■ ■ has to be studied first. We briefly give 
the main results. For a proof the interested reader is referred to [13] (Lemma 
6.3). In particular, the expectation of (which equals the probability for an 

extra reduction in the Montgomery multiplication for basis y(j)) is given by 



E(VF.o-)) 




1 M 

^<Su)) M 
2M R 



if iype{i) =' Q' 
if type{i) — M' . 



The covariance Cov(lFj(j), lFi+i(j)) equals 

( covmqo) := - pjp^ if {type{i),type{i -f 1) = 

covQM(j) := IpjpI - PjP* if (type{i),type{i -f 1) = 

covQQ := ^pi - pI if {type(i),type(i -f 1) = 

whereas IFj(j) and Wh{j) are independent if |i — /i| > 1 . 



cm':q') 

cq':m') 

cq':q') 



( 3 ) 

( 4 ) 

( 5 ) 



3.2 Guessing 

Our attack estimates the exponent bits . . . , dg successively. We assume 

that the attacker has already estimated the exponent bits idt+i and 

that his estimators dw-i, ■ ■ ■ ,dk+i have been correct. From these estimators he 
determines the respective temp values before the if-statement in the for-loop for 
i = fc for all bases ,y(M) in his sample. Finishing the exponentiation 

j/(j)‘^(mod M) still requires k squarings. The number m of remaining multipli- 
cations with iF(j/(j)) results from w, h.am{d) and dw-i,--- ,dk+i- Subtracting 
the extra reductions already carried out from the discretized running time 
yields the remaining discretized running time tdrem(j) which is interpreted as 
realization of random variable 

^drem(j) EdErr(j) T ^^'w+ham(d) — k — m—l(j) T * * * -f ^^w+hani(d) — 2(j)' (^) 

It is reasonable to assume that the (random) measurement error is in- 
dependent of the running time and hence that ?dErr(j) is independent of 
W^+ham{d)-k-m-i{j) f lE^,+hom(d)- 2 (j) • Further, we assume that it is nor- 

mally distributed with expectation 0 and variance := Cer^. 

From the 4-tuples (tdrem(l)> MM( 1 ), Mq(i 2, tQ(l)), ■ . ■ , itdrem{N)iUM{N),UQ{N), 
tQ(^N)) attacker derives an estimator dk for the unknown exponent bit dk- 
Here tQ(j) (resp., UM{j), resp. € {0,1} equals 1 iff the next Montgomery 

multiplication (i.e., squaring, resp. multiplication by d'{y(^j^), resp. squaring after 
this multiplication) requires an extra reduction. Formally, this can be interpreted 
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a statistical decision problem where the attacker has to decide between the two 
hypotheses 9 = 0 (corresponding to the case dk = 0) and 9=1 (corresponding to 
dk = 1) (cf. [13]). Theorem 1 gives the optimal decision strategy for the next bit 
value^, i.e. a strategy which minimizes the probability that dk ^ dk- As before, 
the letter N stands for the sample size. 



Notations. Within theorem 1 we use the abbreviations 
hn{0,j) := (/c-l)p*(l-p,) + mpj(l-pj) + 2(m-l)covMQ(j) + 2(TO-l)covQM(j) + 
2{k-m - l)covQQ + 2A;^covQM(j) + 2^fjlcovQQ + a"^, 
hn{l,j) := {k - l)p*(l -p*) + (m - l)pj{l - pj) + 2(m - 2)covmq(j) + 

2(m-2)covQM(j)+2(fc-w)covQQ+2 ^~™+^ covQM(j)+2^]EyCOVQQ 

+a^, 

ew{0,j I b) := {k - l)p, + mpj + ^(p,q( 6) -p») + fff(PjQ(&) ~ Pj) 

ew{l,j I b) := {k - l)p* + (m - l)p^ + ^^r?fi(p,Q(t,) - p*) + ^(PjQ(t,) - Pj) 

with 

T) 27„3 „ P.-P»P»Q(1) „ 9„ Pj-P»PjQ(i) 

P*Q(1) • 7P*)P*Q(0) • 1-pt ^ PjQW ■ 5P*Pj PjQ(0) ■ 1-p, 



Theorem 1. (i) Assume that the estimators d^-i, ■ ■ ■ ,dfe+i have been correct 
and that dk + ■ ■ ■ + d^ = m, i.e. for each exponentiation m Montgomery multi- 
plications of type ‘M‘ still have to be carried out. Let 



f’N,d ■ ( M x{0, 1}^)^ — >■ H, V’Af,d((tdrem(l)) Wm( 1), • ■ • , Wg(iV) ) ^Q(Ar))) : — 
_ 1 / (tdrem(j) ~ ^Q(j) ~ &w{0,j \ tg(j))) ^ 

hn{0,j) 

(fdrem(j) 'UM{j) ^QU) 6^(1; J I ^Q(j))) j 
hn{l,j) j ■ 



by 



Then the deterministic decision strategy Td- (K x{0, 1}^)^ — >■ {0, 1} defined 



log (1+Cj) 



. hn{0,j) - hn{l,j) 

— Mij) — <"> 



is optimal. 

A nice property of the optimal decision strategy described above is that it 
allows to detect errors. It can be shown that, after an error has occurred (i.e. 
bit dk* yf dk*), the probability Prob(dk = 1) to guess 1 for subsequent bits is 
about 0.20 (although the exact value is parameter-dependent and thus changes 
during the attack). The proof of this fact, as well as precise probabilities, can 
be found in [13] (Theorem 6.5); for the sake of simplicity, however, we will skip 
these - rather complex - expressions here, and focus on the way we can exploit 
this error witness. 

® For a proof of this theorem, the interested reader is referred to [13] (proof of Theorem 
6.5(i)). 
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Remark 3. It can also be shown that the error probability Prob(dk ^ dk) de- 
creases as the attack proceeds. To quantify this probability, the distribution of 
z ■■= V’Af,d('7drem(i). C^M(i), ■ • ■ , Uq(m),Tq(m)) has to be determined for both al- 
ternatives 0 = 0 and 0=1. (We interpret tdrem(i)j • ■ • ,tQ{N) as realizations of 
specific random variables Tdrem(i)) ■ • • The minimal sample size which 

guarantees a particular error probability (e.g., 0.01) is essentially linear in k. 
Once again, we refer the interested reader to [13] (Theorem 6.5) or to [15] (the- 
orem without proof) for precise expressions. 

Let us illustrate these two facts (low probability to guess a 1 after an error 
has occurred and decreasing error probability as the attack proceeds), by an 
example (cf. [13]): 

Example 1. In (a) and (b) below we assume that the estimators dw-i,--- ,dk+i 
have been correct. For randomly chosen bases ,y(N) the following in- 

equalities hold in good approximation. 

Let M/R= 0.7, = 0, > 5620, and further .... 

(a) . . .{k,m) = (510, 255). Then Prob(dk ^ dk) < 0.01. 

(b) ... [k, m) = (440, 220). Then Prob(dk yf dk) < 0.0064. 

(c) ...lk*,m*,k^m) = (505,250,470,235). 

Then Prob(dk = 1) < 0.1879 and Prob(i/'N,d < Ee=i(Z) | dk = 1) < 

0.0170. 

The error detection and correction strategy described below is more efficient 
than its pre- variant described in [13]. Roughly speaking, to estimate the exponent 
bits the attacker derives a sequence of values on which his decisions are 
based on. These values themselves can be interpreted as realizations of random 
variables (cf. Fig. 1 below) whose distribution changes noticeably after the first 
wrong guessing. 

3.3 Error Detection 

The following diagram illustrates the facts we will base our error detection on. 
The curves goiPi and gp are the density functions of Z := xpN,d{‘) defined in 
Theorem 1 if = 0, dk = 1, or if dk' was false for any k' > k, resp. (In 

particular, go, g\ and gp are normal densities; cf. the proof of Theorem 6.5 in 
[13].) As we have seen, in the latter case, it is unlikely to derive dfc = 1. If this 
yet happens the V'Af.<i(-)“'^alue is > Eg=i{Z) (the expectation of Z if 0 = 1) with 
high probability. Both observations will be the basis of our error detection and 
correction strategy. The arrow in Fig. 1 points to the area corresponding to the 
probability that V’Af.d < Eg=i{Z). 

The task of an error detection strategy is to check whether an error has 
occurred. The error probability decreases when k decreases so that estimation 
errors usually occur at the beginning of the attack, at least if = 0 and if 
the attacker knows the values w,ham{d), c,Cer and ts or has guessed them 
correctly (cf. Subsect. 3.5). Note that there is also a possibility for errors to 
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Fig. 1. Density functions of tpN,d depending on dk 



occur at the very end of the attack if the parameter guesses are not absolutely 
correct; this, however, is no serious harm as the few final exponent bits may be 
checked exhaustively: we will thus leave this case aside. 

The basic idea of our error-detection strategy is the following: before 
guessing a new bit dk, we will look at a “window” of the f preceding bits, and 
test the two (non-complementary!) hypotheses: 

(A) The estimators dw- 2 , ■ ■ ■ , dk+i are correct. 

(B) There is an index k' > k + f for which dk' is wrong. 

Roughly speaking, a false hypothesis (A) will be witnessed by an unusually 
large number of Os in the window, whereas a large number of Is tends to infirm 
hypothesis (B). If we can neither reject (A) nor (B), then we simply increase 
the window’s length and test the hypotheses again; if the window reaches its 
maximum size with yet no decision being possible, the attack advances one 
position, deciding the next bit dk as usual. 

If hypothesis (B) is rejected, then we can set up a confirmed index at position 
k + f + 1, meaning that no error has occurred before this point. Therefore we 
will not try to modify any bit located before it, i.e. with greater or equal index 
than a confirmed index. 

If hypothesis (A) is rejected (“alarm”), we must choose one index k” G 
{con — 1, . . . ,k+ 1}, revert that decision, and start the attack from that point 
over again. The attack will continue until either: 

— a new confirmed index is established, in which case the algorithm “forgets” 
an alarm had occurred and continues the attack (having corrected the error 
at k"), or 
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— a further alarm occurs, in which case we conclude the index k" was not 
the first error position; we then choose a new index in the same set {con — 
1, . . . , fc + 1} and restart from that point. 

To determine the threshold values that will conduct Jo the rejection of hy- 
potheses (A) and (B), we interpret the sum nones := dk+f -I- • • • -I- dk+i as a 
realization of a binomial random variable Xi. If (A) is true, then this variable’s 
distribution is Xi ~ B(/, 0.5); we therefore reject (A) if Prob(Xi < nones) < 

Pal := 0.0001. 

The condition to reject (B) is a bit more complicated: in a setup-step, we used 
Theorem 6.5(iii) ([13]) to obtain rough approximators Perr and piem resp., for 
the average probabilities Prob(dk = 1) or Prob(jN,d < E^i=i(Z) | dk = 1), resp., 
within the initial stage of the attack under the condition that d^' was false for a 
k' > k. We reject (B) if len := ||fc-|-l <i< k + f \ di = l,tpN,d < Ee=i{Z)}\ > 2 
and Prob(X 2 > len) < pcon := 0.0005 where X 2 is i?(nones,pien)-distributed. 
The choice of the probabilities Pai and Pcon was somewhat arbitrary. The values 
0.0001 and 0.0005 turned out to be suitable. 

3.4 Error Correction 

We have not detailed yet in which order the successive k" are chosen during the 
successive alarms: after a “new” alarm (i.e. an alarm when any preceding one 
has been “forgotten”) the indices {con — 1, . . . , A: -|- 1} are ordered with respect 
to the rank function defined below. The k" will then be chosen in that order, 
the index with the smallest rank first, until the next “new” alarm. 

The rank function is determined on the basis of two criteria: first, for reason- 
able sample size N the error probability for a single decision is small and thus 
the (wrong) decision for the first false estimator should be “close”. To quan- 
tify this idea, let absdiff(f) denote the number of indices s in {f -|- 8, . . . , i -I- 1} 
for which the absolute value — decbound\ is smaller than the respective 

term for index i. Here decbound corresponds to the decision boundary, marking 
the limit between a decision towards 0 and towards 1 in the decision strategy, 
and is defined as log( ^J~[j_^ ) -I- | log (1 + Cj) (cf. Theorem 1). Intuitively, 
absdifF(t) denotes the number of recent (relative to i) positions for which the 
decision was more difficult to make than for index i. 

Second, from the ipN,di‘)-^ 3 lvLes from which the estimators dcon-i, ■ • ■ , dk+i 
were derived we guess the “region” where the first false estimator is located. For 
this, we define the intervals I\ := {— 00 , Eg^i^Z)], I 2 := {E 0 ^i{Z) , decbound\, 
I 3 := {decbound, Eg=o{Z)], and I 4 := {Eg=o{Z),oo) (these regions can be easily 
visualized on Fig. 1). For s G {con — 1, . . . , A: -|- 1} we set iv(s) := j if the 
respective rpN,d{')-'^Sihie was contained in Ij. Let us now consider the distribution 
of successive estimators among these regions, and compare these distributions 
before and after an error. Denote by k* the first error position: 

— before that error, the estimators should be equidistributed among the 4 
regions; that is, Prob(iv(s) = j) « 0.25 for s > A:*; 




Improving Divide and Conquer Attacks against Cryptosystems 255 



— at the error position, the probability is high for the estimator to be in I 2 or 
I 3 ; in other words, Prob(iv(k*) = 2), Prob(iv(k*) = 3) « 0.5; 

— finally, we have seen that, after the error has occurred, decisions will be 
biased towards 0; the precise distribution is given by Theorem 6.5(iii) in 
[13], that we omit here for simplicity. 

To determine the region where the first error was located, we define L(t) := 

0 25 ™n-t-i Prob(iv(s))) and retain as k" the value of t that maximized 

this term. (In fact, the terms Prob(iv(s)) equal the respective probabilities under 
the assumption that the first error occurred at index t. The term L(t) is an 
approximation for the probability for the observed iu(-)-values to have occurred 
if the first error position was t (cf. also [15]).) 

Finally, we define posmax: {con — 1, . . . , fc + 1}, posmax(i) := \k” — i\ and 
use the rank function 

rank(i) := 7.0 * absdiff(f) + 3.0 * posmax(z) . (8) 

In the same way as for the probabilities Perr and pien we used Theorem 6.5(iii) 
in [13] to pre-compute average values for the probabilities Prob(iv(s) = j) to save 
computation time. The rank function turned out to be very efficient (cf. Subsect. 
3.5). 

3.5 Practical Experiments / Efficiency 

Although the original attack also used an error correction strategy, about 
200000 — 300000 time measurements were necessary to recover a 512-bit key. In 
this section we present empirical results where we applied the optimal decision 
strategy stated in Subsect. 3.2 and the error detection and correction strategy 
from Subsects. 3.3 and 3.4. For our experiments we distinguished two cases. In 
the ideal case we assumed that the time measurements are exact, the attacker 
knows the constants and parameters c, Cer,w and ham{d) and he is able to de- 
termine the setup time exactly. (This corresponds to a computer emulation 
of Algorithm 1 with the number of extra reductions as output.) However, if the 
attacker does not have exact knowledge of the smart card implementation these 
assumptions may be not realistic. Therefore, we also conducted a suggestively 
called real-life attack where the attacker’s knowledge and his abilities were as- 
sumed to be lower, using actual timing measurements from a smart card running 
a cryptographic library ([2]). In this case, we assumed that the attacker does not 
have precise implementation knowledge and exploited various relations to esti- 
mate these values. We refer the interested reader to [15] (Sect. 6) for further 
details. 

Remark 4- As no physical smart card is available yet, timing measurements were 
in fact performed using an emulator [1] which predicts the running time (in 
clock cycles) of a program. The code we used was the ready-for-transfer version 
of the Cascade library, i.e. with critical routines directly written in the card’s 
native assemble language. Since the emulator is designed to allow implementors 
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to optimize their code before “burning” the actual smart cards, its predictions 
should match almost perfectly. Consequently, physical attacks on the smart card 
should not induce many measurement errors more. 

Tables 1 and 2 contain empirical results where we used the optimal decision 
strategy without (Table 1) and combined with error detection and correction 
(Table 2). The last two columns respectively correspond to an ideal case, in 
which all are measured exactly, and to the real-life attack of the Cascade 
chip. 



Table 1. Optimal decision strategy without error correction 



Key size 
(bits) 


number of 


Success rate 


measurements 


ideal case 


real-life attack 


512 


5 000 


12% 


15% 


512 


6 000 


35% 


32% 


512 


7000 


55% 


40% 


512 


8 000 


65% 


46% 


512 


9 000 


95% 


72% 


512 


10 000 


98% 


92% 



Table 2. Optimal decision strategy with error correction 



Key size 
(bits) 


number of 


Success rate 


measurements 


ideal case 


real-life attack 


512 


5 000 


85% 


74% 


512 


6 000 


95% 


85% 


512 


7000 


98% 


89% 


512 


8 000 


100% 


91% 


512 


9 000 


100% 


94% 


512 


10 000 


100% 


100% 



Compared with the original attack presented in [6], our approach - optimal 
decision strategy combined with the error detection and correction strategy de- 
scribed below - has improved the efficiency by a factor of about 50 for 512-bit 
keys, at no cost in prerequisites, generality, or complexity. This improvement 
factor is expected to grow even further with larger keys. The main portion of 
the improvement is obviously due to the optimal bit estimation strategy but 
a brief comparison between both tables shows that the applied error handling 
itself reduces the sample size by about a 40 per cent. In the ideal case in more 
than 70 per cent of the trials the index of the first false decision was ranked on 
position one or two. In the real-life attack the efficiency of the rank function is 
somewhat lower. 
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To conclude this section, an important point to note is that, in this case, the 
detection and correction of errors does not cost any additional time measure- 
ment. 



4 Timing Attack against RSA with CRT 



4.1 Description of the Attack 

Let n = P1P2 denote an RSA modulus where p\ and p2 are large primes. If the 
CRT is used the computation y >->• y'^(modn) decomposes into three steps: 

Step 1: yi := y(modpi). Compute x\ := {yiY (modpi) 

Step 2: r/2 := y(modp2)- Compute X2 '■= (2/2)'^ (modp2) 

Step 3: Return (61X1 -I- b2X2){xaod n) 

The parameters d', d", bi and 62 are precomputed once. In particular, d' := 
d(mod(pi — 1)), d" := d(mod(p2 — 1)) while 61 = 1 (mod pi) and 61 = 0 
(mod P2), and similarly, 62 = 0 (modpi) and 62 = 1 (modp2)- Unlike as in 
Sect. 3 the attacker knows neither the bases yi nor the moduli Pi in Steps 1 and 
2. In particular, the “classical” timing attacks ([7], [6], [13]) do not work. In [14] 
however, a new timing attack was introduced which enables the factorization 
of n if the modular multiplications (mod pi) are carried out with Montgomery’s 
algorithm. We briefly describe the attack. For details, the interested reader is 
referred to [14]. 

As the primes p\ and p2 are of similar size it is reasonable to assume the 
Montgomery constant R (cf. Sect. 3) is equal for both multiplications (mod 
Pi) and (modp2)- For simplicity, for the moment we assume that the modular 
exponentiations in Steps 1 and 2 are carried out with the square & multiply 
algorithm. In accordance to Sect. 3 we define the mappings ^i : Z -)> Zp. by 
d'i(z) := zR{modpi). Let 0 < Mi < W2 < n with U2 — Ui < P\,P2- Three cases 
are possible: 

Case A: {u\ -I- 1 , . . . , M2} does not contain a multiple of p\ or p2- 

Case B: {mi -I- 1 , . . . , M2} contains a multiple of one of p\ or p2 but not of both. 

Case C: {u\ -I- 1 , . . . , M2} contains a multiple of both pi or p2- 

Let R~^ denote the multiplicative inverse of R modulo n. For input Mi?“^(mod 

n) clearly ^i(uR~^ (modn)) = M(modpi) for i= 1,2. In Step i the probability 
for an extra reduction in a Montgomery multiplication with u {mod pi) equals 
(m (mod pi)/ 2 R) (cf. (3) or Theorem 1 in [14]) while the probability for an extra 
reduction in a squaring is pi/ 5 R, independent of the base. The running time for 
the input uR~^ (modn), denoted with T(m), is interpreted as a realization of 
a random variable (cf. [14]). The expectation of the difference — X^^ 
depends essentially on the fact whether Case A, Case B or Case C is true: 



cer \/” 

8 R 
Cer. 2\/» 
8 R 






in Case A 
in Case B 
in Case C. 



(9) 
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This observation is essential for the attack which falls in three phases: In Phase 

1 an “interval set” {u\ + 1, . . . ,U 2 \ has to be found which contains an integer 
multiple of pi or p 2 - Starting from this set, in Phase 2 a sequence of decreasing 
interval subsets has to be determined, each of which containing an integer mul- 
tiple of Pi or p 2 - More precisely, in each step of Phase 2 it is checked whether 
the upper subset {ua -I- 1,... ,U 2 } with '■= [(ui + W2)/2]) contains such a 
multiple or not. The decisions in Phase 1 and 2 are based on the time differences 
T{u 2 ) — T{ui) or T{u 2 )—T{u^), resp., where the attacker decides for “Case A” iff 
t\u 2 ) — T{ui) > — Cer ^/njldR or T{u 2 ) — T{u 3 ) > — Cer ^/n/16R, resp. (Note 
that there is no need to distinguish between Cases B and C.) When the actual 
subset {u\ -1-1,... ,^ 2 } is sufficiently small Phase 3 begins where gcd(rt,n) is 
calculated for all u contained in this subset. If all decisions within Phase 1 and 

2 were correct then the final subset indeed contains a multiple of pi or p 2 ■ Then 
Phase 3 delivers the factorization of n. 

4.2 Detecting Errors 

However, at any instant within Phase 2 the attacker can verify with high prob- 
ability whether his decisions were correct so far, i.e. whether the actual interval 
{ui -I- 1, . . . , U 2 } really contains a multiple of pi or p 2 - He just has to apply the 
decision rule to a time difference for neighbouring values of Ui and U 2 , resp., 
e.g. to T{u 2 — 1) — T{ui + 1). If this leads to the same decision it is confirmed 
with overwhelming probability that the interval {ui -I- 1, . . . , M 2 } truly contains 
a multiple of pi or p 2 - (We then call {u\ -I- 1, . . . , M 2 } a confirmed interval.) 

Otherwise, the attacker evaluates a further time difference (e.g. T(m 2 — 2) — 
T(mi -|- 2)). Depending on this difference he either finally confirms the interval 
{mi-|-1, ... , M 2 } or restarts the attack at the preceding confirmed interval {mi;c-I- 
1, . . . , M 2 ;c} using values m} and u '2 close to mi;c and M 2 ;c, resp. 

As opposed to the attack of section 3, this error detection method requires 
additional time measurements to be carried out, and has therefore significant 
impact on the attack’s efficiency. It would thus be useful to be able to reduce 
the number of detection steps applied. 

In [14] a static error detection and correction is applied: After a pre-assigned 
number of steps (adapted to the parameters n and R) a new confirmed interval 
is tried to be established. If this fails (due to a preceding error) the attack is 
restarted at the preceding confirmed interval. For n « 0.7 • 2^°^^ and R = 2®^^ 
practical attacks required 570 time measurements in average where confirmed 
intervals were tried to establish after each 42 steps. 

Remark 5. (i) This attack is somewhat atypical as it does not directly recon- 
struct the secret exponent d itself. Instead, the interval sequence delivers the 
bit representation of a multiple of pi, beginning with the most significant bits. 
This multiple, however, may be viewed as a “key” as its knowledge enables the 
factorization of the modulus. 

(ii) If the the initial value M 2 in Phase 1 is chosen sufficiently small the attacker 
will find a prime factor pi itself rather than just a multiple of it. Using an al- 
gorithm of Coppersmith (cf. [3]), it then suffices to reconstruct the upper half 
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of the bit representation of pi which almost halves the number of time measure- 
ments in Phase 2. For n « 0.7 • 2^°^^ and R = 2®^^, for example, about 300 time 
measurements are sufficient for the whole attack (cf. [14] (Remark 6)). 

(iii) Although its efficiency decreases the attack also works if table methods are 
used in Steps 1 and 2 of the CRT. For a 4-ary table (storing 2^ — 1 = 15 values), 
for example, about 17700 time measurements are required in average ([14], Sect. 

7). 

4.3 Dynamic Error Detection and Correction 

However, this static error detection and correction strategy can still be improved, 
using a similar argument to the one of section 3.3. In fact, after a wrong decision 
in Phase 2 the following decisions should always be “no multiple of p\ or p 2 in 
the upper subset {rta -I- 1, . . . , U 2 }” • If all decisions were correct so far, however, 
within Phase 2 this event should occur with probability 1/2. 

This suggests the following error detection and correction strategy: If in none 
of the preceding v (let’s say v = 13) steps a multiple of p\ or p 2 was assumed 
in the respective upper subinterval then try to confirm the actual interval {ui + 
1, . . . , ^ 2 } by evaluating T{u 2 + 1) — T{ui -I- 1). If this attempt fails, restart the 
attack at the last but one interval where a multiple of pi or p 2 was assumed in 
the upper subset (restart with neighbored values). 

For n « 0.7 • 2^^^^ and R = 2®^^, for example, the static strategy described 
above requires about 45 of 570 time measurements in average for error detection 
or correction reasons. However, the probability for a single decision in Phase 1 
or 2 to be wrong is about 0.001. Hence about 0.5 errors are expected within the 
attack whereas the probability for a “false alarm” is about 2®“^^ = 1/16. Thus 
the proposed new error detection and correction strategy costs about 0.5(2-1-15-1- 
2) -|- 2/32 « 10 time measurements which reduces the average number of time 
measurements to about 535 which is a reduction by 6 per cent. If the modular 
exponentiations in Steps 1 and 2 of the CRT use tables, then the portion of 
time measurements carried out due to error detection and correction reasons is 
considerably larger than for the square & multiply algorithm. Consequently, the 
gain of efficiency caused by a dynamic error detection and correction strategy 
(with an adapted value v) also increases. 

5 Timing Attack against AES (Rijndael) 

5.1 Brief Description of Rijndael and the Vulnerable Model 

A complete description of Rijndael can be found in [4]. We will focus here on 
the parts of interest for the attack. 

A Rijndael encryption consists in an initial round key addition, followed by 
M round transformations, the last round being slightly different from the others. 
The different transformations applied during each round operate on an array of 
bytes, named the state, composed of 4 lines and Mh columns (where Mt is the 
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block size, in 32-bit words). Basically, each round, except the last one, consists 
of the following steps: ByteSub (byte-by-byte substitution S), ShiftRow (fixed 
permutation of bytes), MixColumn (described below) and AddRoundKey (round 
key © state) where “©” denotes the bitwise XOR-addition. One AddRoundKey 
operation is performed before the first round. In the final round there is no 
MixColumn operation. 

The MixColumn transformation operates on the columns of the state and 
applies them the following matrix multiplication: 



'bo 




02 03 01 of 




ao 


hi 




01 02 03 01 




ai 


b2 




01 01 02 03 




02 


bs 




03 01 01 02 




.“3. 



where the multiplication is defined in GF(2^) as multiplication of binary poly- 
nomials modulo the irreducible polynomial x® + + a: + 1. 

Due to the choice of the matrix and irreducible polynomial, MixColumn can 
be implemented very efficiently: first, it is easy to see that, as ‘03’=‘02’+‘01’, 
the only multiplications that will actually have to be performed are by ‘02’; 
second, it can be showed (cf. [4]) that multiplication by ‘02’ in GF(256) can be 
implemented as follows: 

— shift byte one position left, 

— if a carry occurs, XOR the result with hexadecimal ‘IB’. 

Assumptions. If the implementation is careless, the multiplications with ‘02’ 
and ‘03’ will not take constant time, but will take longer in the case where a 
carry occurs. Throughout the rest of this section, we assume such a behaviour. 
Moreover, we assume that the time needed for the remaining operations, i.e. 
for substitutions, permutations and XOR additions, does not depend on the 
plaintext and thus is constant for all encryptions. (However, slight deviations 
could be interpreted as a part of the measurement error; cf. Subsection 5.3.) 
For simplicity, we further assume that the length of the key {=F bytes) is not 
larger than that of a plaintext block (=4M{, bytes). Finally, we assume that the 
attacker knows at least one pair {p,c := Enc(p; k)) where Enc(-; k) stands for a 
Rijndael encryption with the secret (unknown) key k. This pair will be used to 
check whether a key candidate is correct or not. 

5.2 Basic Idea 

Let us focus on what will happen to a given byte (say, the first) of a known 
plaintext during the first few encryption sub-steps: 

— before entering the first round, that byte will be XOR-ed with the first byte 
of the first round key (equals the first byte of k) - call it Ri - whose value 
is unknown, but constant independent of the plain text; 
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— then a substitution S, according to a known S-box, will take place; 

— the byte will then be moved around (to a known place) by Shif tRow, without 
being modified; 

~ finally, MixColmnn will be applied; during this operation, the byte will be 
multiplied at least once by ‘02’ (the result may be stored and reused for the 
multiplication with ‘03’). By assumption the multiplications will take longer 
if the first bit of the byte is set. 

Could we observe the time taken by that peculiar multiplications, we would 
then be able to deduce the value of R\ with about 8 encryptions. Of course, this 
is not the case, as we have no access to partial timings, but only to the total 
encryption time. However, if we encrypt a large amount of random messages, we 
can expect all other operations to behave as random noise and therefore hope to 
be able to “filter out” the information we are interested in. Naturally, the same 
method can be applied to guess the other bytes i? 2 , • ■ • , i?_F of the searched secret 
key k. 

To make the attack robust against errors induced by random noise, we will 
not build a single answer, but rather a set of possible keys, small enough to 
make exhaustive search easy. More precisely, for each index j < F the attacker 
determines a (small) set of “candidates” , denoted with Caj . After the “guessing 
phase” has been finished the attacker computes Enc(p; cai||ca 2 || • • • Hcuf) for 
all possible combinations of key byte candidates (i.e. caj G Caj for each j < 
F). If Enc(p; cai;o||ca 2 ;o|| • • • ||caF;o) = c then k = cai;o||ca 2 ;o|| • • • ||cof ;0 with 
overwhelming probability. 

5.3 The Pure Attack 

We begin with a definition. 

Definition 1. The Greek letter A denotes the extra time needed for the mul- 
tiplications of a byte with ‘02’ and ‘03’ if a shift occurs. The term iV(/i, u^) 
denotes a normal distribution with mean p, and variance In particular, 
f{f) := jy/^a denotes its Lebesgue density. 

In the first step the attacker randomly chooses plaintexts pi, . . . , pn and 
measures the running times t\, . . . Gn needed for their encryption with the se- 
cret key k. In particular, tj = tj t^rr with tj = Time(Enc{pj; k)) while tsrr 
denotes the measurement error. Then he initializes tables U\,... ,Up where 
Us ■= (Ms[f][j])o<j< 255 ;i<j<AT- The Component equals the most signifi- 
cant bit of S{i © byte of Pj). In other words, tells - assuming the 

byte of the key, Rg, is equal to i - whether the first multiplication by ‘02’ 
applied to the corresponding byte involves a shift or not. 

If yf z the measured running time tj can be interpreted as a realization 
of a normally distributed random variable Tj ~ A(/i, cr^) where cr^ = +cr|;j.j. 

with a\ := 4Mf,(M — l)Z\^/4 while denotes the variance of the measure- 
ment error. However, if = z the table entry zzs[z][j] provides an additional 
piece of information (concerning particular multiplications by ‘02’ and ‘03’). 
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Consequently, Tj ~ + A{us[i][j] — 0.5), — Z\^/4). The original question, 

namely whether Rg = z, can be viewed as a more general problem which is 
independent of any cryptographic context: namely, whether Tj ~ N{^,a) or 
Tj ^ N{^ + A{us[i] \j] — 0.5), cr^ — A? jT) for j < N . The measured running times 
ti,. . . ,1 m is the only information available. For this, statistical decision theory 
can be applied. 

Remark 6. (parameter estimation) For simplicity, first assume that the attacker 
is able to use a device identical to the one he wants to attack with a known 
key. He then randomly generates plaintexts p'l, ■ ■ ■ ,p'mi p", . . . ,p%i with 
constant first byte in each subset such that for the known key for the respective 
multiplications by ‘02’ and ‘03’ a shift occurs (subset 1), resp. does not occur 
(subset 2). The difference in the mean values delivers an estimator for A (and 
thus for CT^) and their arithmetical mean an estimator for p,. If the attacker 
cannot use a known key the estimation strategy is similar; he just has to identify 
two subsets with the properties from above, which can easily be done as follows: 
he starts by building two subsets of plaintexts {A,B), in which the first byte of 
every element of A and the first byte of every element of B are fixed to (two 
different) constants. If the average time for processing subset A significantly 
differs from that for subset B, then the attacker has identified the two desired 
sets; otherwise he simply repeats the operation with a third subset, comparing 
its average processing time with that of the second etc. 

Roughly speaking, in a statistical decision problem the statistician (here: the 
attacker) estimates the unknown distribution of random variable (s), pg, on the 
basis of an observation w G 17. The set 0 describes all possible alternatives. We 
will not consider statistical decision problems in full generality (cf., e.g. [17]) but 
apply the mechanisms to our specific problem. Here the parameter set 0 equals 
{0, 1} where 9 = 0 denotes the case i = Rg while 9 = 1 stands for i ^ Rg. The 
probability that for byte i the hypothesis 9 = 0 (resp., 0 = 1) is true equals 
? 7 o = 1/256 (resp., p\ = 255/256). If the attacker decides for 6* = 0 although 
0 = 1 is true he needlessly adds one candidate i to Cog which increases the time 
for final exhaustive search. Erroneously deciding for 9=1, i.e. cancelling the 
true byte value Rg, is much worse as the attack must fail in the end. To quantify 
these considerations, we introduce the loss function s{9, 9') > 0 where the first 
component denotes the correct parameter and the second the estimated one. 
Of course, s(0,0) = s(l,l) = 0 (correct decisions). We further set^ s(l,0) = 1 
and s(0, 1) = 100. Of course, the attacker uses the decision strategy dopt which 
minimizes the expected loss, that is, dopt{t\, . . . ,tiq) := u € {0, 1} if u minimizes 
the term 



1 N 

'^Ves{9,u)Y[fg.j(tj). ( 11 ) 

0=0 j = l 

The ratio s(0, l)/s(l, 0) is somewhat arbitrary. Increasing s(0, 1) reduces the proba- 
bility that the correct value is cancelled but simultaneously increases the candidate 
set Cas- 
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For the moment, fi-j (resp., /oy) denotes a normal density with mean /r and 
variance (resp., with mean /i + — 0.5) and variance — Z\^/4). As 

the variances for d = 0 and 6=1 are almost equal the attacker may use the 
simplified decision strategy ... ,tN) =0 instead iff 



N 



+ -0-5) 






{fi - tjf) < 2cr^(log(s(0, 1) - log(255)) 

( 12 ) 



with only minimal loss of efficiency. The attacker applies this decision rule 
to each t G {0, . . . , 255} which yields the candidate set Ca^. 



5.4 Error Location and New Guesses 

This timing attack on Rijndael is an atypical divide and conquer attack as a 
wrong key estimator does not influence the following partial key guesses. In 
principle, the attacker could thus apply the decision strategy from the preceding 
section independently to all key bytes and completely resign on any intermediate 
error detection strategy. On the other hand, the candidate sets must be kept 
small enough for the verification phase to remain practically feasible. We propose 
the following strategy. 

We assume that the attacker has already determined the candidate sets 
Cai, . . . ,Cas-i- To derive Ca* we initialize a matrix (c^, z)i<^<|Ca,_i|; 0 <i <255 
where c™ G Cog-i. Our aim is to “tick” all components for which at least one 
component is a correct candidate (for Rs~i or Rg, resp.). If this can be managed 
with reasonable probability, many ticks should occur in the row indexed by the 
correct candidate Cx = Rs-i and in the column indexed by z = i?«. This suggests 
the following strategy: cancel all elements of Cag-i besides y (let’s say y < 4) 
candidates whose rows had the most ticks; similarly, build a set Cus containing 
z (let’s say z < 20) bytes for which in the corresponding columns the most ticks 
occur. 

An efficient “ticking strategy” remains to be derived. For each component 
four cases are possible: Case A: {cm = Rs-iji = Rs), Case B: {cm = 
Rs— 1 t^ As), Case C. (^Cjn yf As— i,z — As) and Case F). (c^t^ yf As— i,z yf As). 
Then Tj is normally distributed with mean /x + A{us-i[cm][j] + ’as[z][j] — 1.0), 
/x + Z\(zZs-i[cm][j] — 0.5), /Lz + Z\(zzs[z][j] — 0.5) or y, resp., and variance — Z\^/2, 
(j2 _ (j 2 _ Z\^/4, or fj^, resp. For the moment the respective densities 

are suggestively denoted with Ja-j, Jb-j, fc-j, foj- The cases A, B, C and 
D occur with probabilities ija = l/256|C'as-i|, tje = 255/256|Cas-i |, zyc = 
(|Cas-i| — l)/256|Cas-i| and Tjo = 1 — rjA — Vb — Vc- If Case D is true, the loss 
for yet ticking the component (c^jz) is set sq ■= 1- If Case B or C is true, but 
the component is not ticked, the loss equals sbc ■= 30 whereas sa ■= 2sbc for 
Case A. (Recall that we are only interested in the total number of ticks in a row 
or column but not in their positions.) This leads to the following decision rule: 




264 W. Schindler, F. Koeune, and J.-J. Quisquater 



Tick the component iff 

N N ( ^ ^ \ 

VDSoWfo-jitj) < r]ASAWfA-j(tj) + SBC VEWfE-jitj) + »7c]^/Cy • 

i=i i=i V / 

(13) 

As the attack itself, its error handling is also atypical. There is no error 
correction but some candidates (which are assumed to be false) are cancelled. 
In the original meaning of the word there is no error detection either, as errors 
must trivially have occured if the previous candidate set contains more than 
one element. However, some false estimators are located and cancelled. The 
location of previous errors and the estimation of new key byte candidates are 
done simultaneously. Compared with the pure attack, the number of operations 
in the guessing phase increases linearly with the average size of the candidate 
sets before their reduction. On the other hand, the number of operations in 
the verifying phase decreases by an (exponential) factor of about (3^ where (3 
denotes the average ratio between the size of a candidate set after cancelling 
wrong candidates and the size before the cancelling. (Note that Cap can finally 
be reduced by applying error location to Cap x {0,... ,255}.) Consequently, 
before the cancelling, the candidate sets may be much larger than for the pure 
attack, which means that the number of time measurements can be reduced 
significantly. 



5.5 Practical Results 

In [8] an attack was experimented against the 128-bit block, 128-bit key version 
of Rijndael with M = 10 rounds. It turned out that, with 3000 samples per key 
byte, the complete key was recovered with very high probability at negligible 
cost. The attack described above reduced the total number of time measurements 
from 16 • 3000 = 48000 to 4000, with a success rate of more than 90%. The 
set Cai was determined with the pure attack, Cu 2 , ■ ■ ■ , Cap with the strategy 
described in the preceding subsection. The same time measurements were used 
for all key bytes. The decisions within the pure attack and the error location 
strategy for s = 2 were strongly correlated. Hence for s = 2 we used Co} := 
{0, . . . ,255} instead of Ca\. A more time-efficient variant which yet requires 
more time measurements is to use another sample of size 2000 for the pure 
attack. 

Note that the sets Cai are in fact ordered, with the most probable candi- 
dates first. Therefore the final “exhaustive search” phase generally succeeds after 
exploring a very small portion of the search space. 

Although the new attack is much more efficient than the one in [8] both 
attacks are based on the same basic idea (cf. Subsect. 5.2) and exploit the same 
implementation weakness. 
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6 Algorithmically Based Divide and Conquer Attacks 

An important design criterion for cryptographic algorithms is that it must not 
be possible to confirm or validate partial key guesses, with reasonable proba- 
bility, using algorithmically based attacks, i.e. attacks which do not consider 
implementation details. Divide and conquer attacks thus typically occur in side 
channel attacks. However, they are not restricted to that field. Siegenthaler’s 
attack (cf. [16]), for example, has been well-known for almost 20 years. 

In [16] Siegenthaler analyzes a divide and conquer attack on a particular class 
of stream ciphers for which the key stream ki,k 2 , ■ ■ ■ (single bits) is generated by 
m linear feedback shift registers LFSRi , . . . , LFSRm of length ri , . . . , r™ with 
primitive feedback polynomials qi, ■ ■ ■ ,qm whose output values at time t, de- 
noted with a;i(t), . . . , Xm(t), are memoryless combined with /: {0, 1}™ — >■ {0, 1}. 
More precisely, kt := /(a;(i)t, . . . , X(rn)t) and Ct := pt © kt where pt and c* denote 
the plaintext and ciphertext bit, resp. The searched key is the initial values 
of IjFSRi, . . . ,LFSRm and, if the feedback polynomials are unknown, also of 
qi- Let the random variables Xi, . . . , be independent and equidistributed on 
{0,1}’'!,... ,{0,1}’'™, resp. If £i := Prob(Xj = f(Xi, . . . ,X^)) -0.5 yf 0, the 
part of the key (the initial value of LFSRi and eventually qi) may be guessed 
independently from the remainder. More precisely, if the candidate for the 
key part is correct then #{1 < t | a;qt) = f{xi(t) ,■■■ « {0.5 + ei)N 

can be expected whereas « 0.5A^ else. If yf 0 for all i, this enables a 
straight-forward divide and conquer attack with known plaintext. In fact, even 
a ciphertext-only-attack is feasible (cf. [16]). Vice versa, the use of correlation 
immune combiners of high order (eventually with memory) prevents such at- 
tacks (cf., e.g., [16], [12]) as the attacker then had to guess many key parts 
simultaneously which is practically infeasible. 

As in the timing attack on Rijndael but unlike in the other attacks treated 
in this paper the key parts can be guessed independently. If the available sam- 
ple size N is small (in relation to the Ci), for each key part so many candi- 
dates may remain that the verifying phase may be very costly. Depending on 
the concrete combiner /, however, there may exist a very obvious and near 
at hand error location strategy. Assume, for example, that the absolute value 
\ •= |Prob(Xjj ©• • -©Xjj^ =f(Xi,... ,Xm)) — 0.5] is fairly large. Then we 
can pre-check the cartesian product x • • • x Caj^ of candidate sets. However, 
unlike in the timing attack against Rijndael, there is no tendency for “ticks” in 
particular regions of Caj^ x • • • x Co^y. More precisely, we can expect correlations 
only for the 5-tuple for which all components are correct; other pre-confirmed 
5-tuples result from statistical deviations. We hence do not pre-confirm subsets 
of Cojj,... ,Cajj but a subset of x ••• x Ca^y whose elements are the 
remaining candidates for LFSRj -^^, . . . ,LFSRj^^. 

7 Conclusion 

This paper proposed a method to improve the error detection/correction step 
in divide and conquer attacks, independently of the attack itself. Although the 
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method is not a “ready-to-use” tool, in the sense that some work and analysis is 
necessary to instantiate it to a particular divide and conquer problem, we believe 
the above examples have highlighted its basic principles, allowing a motivated 
user to apply it with reasonable effort. 

Moreover, the efficiency of an adequate error management policy is often big 
enough (sometimes it reduces the necessary sample size up to factor 2) to make 
it worth this effort, especially in a real-world attack. 

A good understanding of the potential power of an attack is necessary to be 
able to design adequate countermeasures. This paper, focusing on an in-depth 
analysis of the available data in order to exploit them in a nearly-optimal way, 
attempted to give an insight of that potential power. 
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Abstract. This paper investigates interoperability problems arising 
from the use of dissimilar key recovery mechanisms in encrypted commu- 
nications. The components that can cause interoperability problems are 
identihed and a protocol is proposed where two communicating entities 
can negotiate the key recovery mechanism(s) to be used. The ultimate 
goal is to provide the entities a means to agree either on a mutually 
acceptable KRM or on different, yet interoperable, mechanisms of their 
choice. 
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1 Introduction 

As business increases its use of encryption for data confidentiality, the threats 
arising from the lack of access to decryption keys grow [2]. Although transient 
keys typically should not be retained, there is a potential need to access these 
keys during their lifetime, i.e. during the communication session (or afterwards 
if the company logs any communications). Corporations might want to access 
encrypted communications to check for malicious software or to track leakage of 
sensitive information. 

Key recovery mechanisms (KRMs) address this problem [3,10] by provid- 
ing the means to recover decryption keys. They can be divided into two types: 
key escrow and key encapsulation mechanisms. A key escrow mechanism, [8], is 
a method of key recovery (KR) where the secret or private keys, key parts or 
key-related information are stored by one or more key escrow agents. In a key en- 
capsulation mechanism, keys, key parts, or key-related information are enclosed 
in a KR block, which is typically attached to the data and encrypted specifi- 
cally for the key recovery agent (KRA). Here the terms key escrow agent and 
key recovery agent are considered synonymous, and refer to the trusted entity 
which responds to key recovery requests and potentially holds users’ key-related 
material. 

The variety of KRMs so far proposed, in conjunction with the lack of a stan- 
dard for KRMs, means that interoperability problems are likely to arise from 
the use of dissimilar KRMs in encrypted communications [11]. Interoperability, 
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as in [5], means the ability of entity A, using KRM KRMa, to establish a KR- 
enabled cryptographic association with entity B, using KRM KRMb- Entities 
deploying dissimilar KRMs may not know whether the remote party can deal 
with their KRM’s demands. This may force them to avoid key recovery, with 
associated increased risks. Note that, even if KRMs were standardised, interop- 
erability problems may still arise; standards tend to provide a variety of sound 
but not necessarily interoperable mechanisms. 

This paper addresses these interoperability problems. The components that 
can cause interoperability problems are identified and a protocol is proposed 
that, to a great extent, overcomes the interoperability problems. The protocol 
offers communicating parties the ability to agree either on a mutually accept- 
able KRM or on different, yet interoperable, mechanisms for their encrypted 
communications . 



2 Key Recovery Enabled Communications 

In the context of communications between entities A and B, we give two scenarios 
where KRM use can affect the establishment of a cryptographic association. 

1. Entities A and B make use of KRMs KRMa and KRMb respectively, which 
might be identical, compatible or dissimilar mechanisms. In the case of iden- 
tical or compatible mechanisms the two entities are not expected to face 
any problems. Problems, however, might arise if the two entities make use 
of dissimilar mechanisms. They are unlikely to be able to establish a secure 
communication while using their respective KRMs, as this would typically 
demand each entity to fulfil the requirements of the peer’s KRM. 

2. Entity A uses KRMa while B does not use KR. The issues that arise here 
are whether B will be able to cope with KRMa’s needs, and whether A will 
be able to generate valid KR information. For the two entities to be able to 
communicate, assuming that A manages to generate valid KR information, 
B should at least be aware that A makes use of a KRM. This is important 
as B should not discard incoming traffic because of unrecognised KR fields 
that B cannot interpret. Another potential problem is whether A’s policy 
will permit the acceptance of incoming traffic that does not make use of KR. 
If A operates within a corporate environment this requirement is likely to 
be crucial, as the company might want to check incoming data for malicious 
software before they reach their destination. 

Communicating entities wanting KR functionality for encrypted communi- 
cations without the above problems must use interoperable KRMs. Also, any 
deployed cryptographic mechanisms with embedded KR functionality should be 
compatible with cryptographic products not using KR. These requirements will 
ensure that neither of the above scenarios will prevent the establishment of a 



secure session. 
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3 Factors That Can Affect Interoperability 

Many KRMs, especially key escrow mechanisms, demand the use of a specific 
mechanism for session key generation, and as such they can be considered as 
part of the key establishment protocol. This restriction is a cause of KRM in- 
teroperability problems. A KRM with this property demands compatibility of 
the underlying key establishment protocols, a requirement that is not always 
fulfilled. Key escrow mechanisms suffer more from this problem, as most require 
the use of a specific key establishment protocol. By contrast, key encapsulation 
schemes appear to be more adaptable in this respect, since they simply wrap 
the generated data encryption key under the KRA’s public encryption key (and 
hence potentially work with any key establishment protocol). 

Flexibility of key encapsulation mechanisms with respect to the underlying 
key establishment protocols does not always rule out interoperability problems. 
Interoperability very much depends on what additional requirements exist. For 
example, problems arise if the recipient of encrypted data needs to validate KR 
information, or the receiver relies on the sender to generate KR information. 
These needs will typically demand interaction between the two parties dur- 
ing KR information generation/ verification. If either parties’ mechanism can- 
not meet the peer’s demands, interoperability problems are likely to arise. This 
problem is not restricted to key encapsulation schemes. In the case of key escrow 
mechanisms, a requirement for participation of both entities in generating KR 
information will have the same effect. We can therefore divide KRMs into two 
classes, depending on their communications requirements during KR information 
generation/ verification . 

1. KRMs where each entity generates KR information for its own use only, 
without peer assistance. If neither party requires verification of the peer’s 
KR information prior to decryption, interoperability issues become of minor 
importance and the parties will be able to use their respective KRMs. 

2. KRMs that require interaction between the two entities for the genera- 
tion/verification of KR information. Interaction might be needed in the fol- 
lowing cases. 

— Exchange of data is required for KR information generation. 

— The sender generates KR information both for his own and the peer’s 
needs. This is particularly relevant to single-message communications. 

— Either party wishes, e.g. for policy reasons, to verify the KR information 
generated by the peer. 

In situations like these interoperability is an issue that must be dealt with; 
otherwise, it is likely to lead to a failure to establish secure communications. 

In summary, the two factors that are likely to affect the interoperability of 
KRMs in encrypted communication sessions are: 

1. the KRMs’ dependence on the underlying key establishment protocol, and 

2. the interaction requirements between the communicating parties for the gen- 
eration and/or verification of KR information. 
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4 Interoperable Mechanisms 

Based on the above analysis, a mechanism which is neither dependent on the 
underlying key establishment protocol nor needs any interaction with the peer for 
generation or verification of KR information, will always be interoperable with a 
KRM with the same requirements. The two mechanisms can work independently 
regardless of the underlying key establishment protocol. A mechanism with these 
requirements can also inter-operate with one that is dependent on an underlying 
key establishment protocol which both entities can deal with, as long as it does 
not require interaction with the peer for KR information generation/ verification. 

Interoperability problems are likely to arise in the following cases (we as- 
sume that the communicating parties can deal with all possible underlying key 
establishment protocols) : 

1. Both KRMs use specific key establishment mechanisms regardless of inter- 
action requirements. In this case the interoperability of the KRMs depends 
on the compatibility of the underlying cryptographic mechanisms. 

2. At least one KRM demands peer participation in KR information gener- 
ation/verification. For example, if the policy demands that the KR infor- 
mation for the receiver should be generated by the sender, then the sender 
must be able to handle the receiver’s mechanism. Otherwise, it is likely that 
establishment of secure communications will fail. 

The chances of interoperability problems are therefore considerable, and a so- 
lution is needed. However, due to the significant differences in the characteristics 
of existing KRMs it is unlikely that a single model, such as the one proposed 
by the Key Recovery Alliance in [5] (see also [11]), can apply to all of them. 
This latter model mainly addresses problems arising from the transmission of 
KR information in proprietary formats, and suggests as a solution the use of 
a wrapper, namely the “Common Key Recovery Block” . However, as described 
in [9], it fails to achieve one of its main objectives, which is to offer the ability 
for validation of KR information by the peer, and does not deal with the situa- 
tion where KR information cannot be generated because of the use of dissimilar 
mechanisms. 

A different approach is described in this paper, which requires the entities to 
be able to deal with more than one KRM. This enables some of the difficulties 
described above to be avoided. 

5 A KRM Negotiation Protocol 

We now describe a protocol designed to enable two communicating parties to 
negotiate the KRM to be used in an encrypted communication session. Its main 
objective is to deal with situations where the parties wish to make use of different, 
non-interoperable, KRMs. 

A similar protocol specifically designed to allow the negotiation of KRMs us- 
ing the Internet Security Association and Key Management Protocol (ISAKMP) 
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[7] is described in [1]. This model, however, adopts the mechanism described in 
[5] for the transmission of KR information, which, as mentioned in the previous 
section, has been shown to have problems. A more generalised model is described 
here that considers the different requirements of various mechanisms and the ad- 
ditional requirements that might arise regarding the exchange of cryptographic 
certificates. Moreover, the proposed protocol can be used to provide key recovery 
functionality in the application layer, in contrast to the mechanism proposed by 
the Key Recovery Alliance which targets the IP layer. 

Note that in the protocol description we refer to the two parties as ‘Client’ 
and ‘Server’; this is so as to follow the client-server model terminology as closely 
as possible. 



5.1 The Proposed Scheme 

The protocol consists of the following steps (messages in Small Caps are op- 
tional; more detailed descriptions of the exchanged messages are given in the 
next section). 

Client Server 

ClientHello > 

ServerHello 
Certificate 
Cert i f i cateRequest 
i ServerHelloDone 

Certificate 

KRParameters 

Finished y 

i Finished 

The client first sends the ClientHello message, to which the server responds 
with the ServerHello. With these two messages the two entities exchange the 
parameters necessary for KRM negotiation. After the ServerHello (if required 
by the selected KRM(s)) the server sends the Certificate message containing the 
appropriate certificates, requests client certificates with the CertificateRequest, 
and, finally, sends the ServerHelloDone message. The client responds with the 
optional Certificate message, containing the certificates specified in the Cer- 
tificateRequest, the optional KRParameters message, containing any additional 
information required by the selected KRMs, and, finally, the Finished message. 
The server verifies the received Finished message and, if successful, responds with 
a similar Finished message. On receipt, the client verifies it and, if successful, 
the negotiation terminates successfully. 



5.2 Exchanged Messages 

In the following sections the exchanged messages are described in detail. 
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Client Hello. The client, as previously mentioned, initiates the protocol by 
sending the first message of the negotiation protocol {ClientHello). The Clien- 
tHello contains a list of KRMs (from a complete list of mechanisms the protocol 
supports) that the client is willing to use, in decreasing order of preference. A 
default mechanism that all parties are assumed to be able to use can be included 
in the list. 

With the ClientHello the client must also inform the server whether he wants 
to resume a previous session by including the identifier in the appropriate field. 
If this field is empty a new session id should be assigned by the server. 



Server Hello. If the client does not request the resumption of a previous session, 
or if the server wants to initiate a new one, the server must assign a new session 
id, which will be sent with the ServerHello message. Otherwise, the server will 
respond with the session id included in the ClientHello and proceed with the 
Finished message. 

If the server initiates a new session, he indicates the mechanism that he 
wants the client to use, and the mechanism that the server will use. The two 
mechanisms, if not identical, must be interoperable. For this purpose, a list of all 
possible matches of interoperable mechanisms has to be kept by both entities. 
If an acceptable match is not found in the list, the server can either terminate 
the negotiation protocol unsuccessfully, or choose the default mechanism if this 
is in the list sent by the client. Otherwise the server drops the session. 

If the selection of the mechanism for both entities is a KRM that can itself 
handle the exchange of certificates and related KR parameters, the two parties 
can terminate the negotiation protocol and leave this KRM to take charge. To 
achieve this the server will send a Finish message (after the ServerHello) to 
indicate that control is now to be passed to the negotiated KRM(s). 

Finally, within the ServerHello the server also includes the KRParameters 
field, which carries any additional information that the client has to possess to 
be able to deal with the server’s KRM. 



Certificate and KRM related information exchange. Depending on the 
selection of the KRM, and if the server has not sent a Finish message, the server 
proceeds with the Certificate message. This message is optional and contains the 
required certificates (for the chosen mechanisms) for the generation and/or ver- 
ification of KR information. Following that, and depending on the requirements 
of the chosen KRM(s), the server can also send a request for the correspond- 
ing client’s certificates using the CertificateRequest. The purpose of the Cer- 
tificateRequest is to give the client a list of specific types of certificates needed 
by the server, and a list of certification authorities trusted by the server. Af- 
ter the CertificateRequest the server sends the Server Hello Done message, which 
indicates that the server has completed his Hello messages. 

On receipt of the ServerHelloDone, if the client has received a Certifi- 
cateRequest he responds with his Certificate message, which contains the re- 
quested certificates, assuming that he is in possession of the appropriate ones. 
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Further, the client sends in the optional KRParameters message any additional 
information required by the selected for the client KRM. Note that the corre- 
sponding KRParameters for the server’s KRM is sent as part of the ServerPlello 
message. 

Finish messages. If the client is satisfied with the current selection of mecha- 
nisms he sends the Finish message, which indicates that the client is willing to 
proceed with the current selection of mechanisms. Subsequently, the client waits 
for the corresponding server’s Finish message, whose receipt indicates successful 
execution of the protocol. 

5.3 Protecting the Integrity of the Hello Messages 

After the execution of the above protocol the two entities are not sure whether 
any of the exchanged messages have been altered during transmission by an ad- 
versary, as the specified protocol includes no proper integrity checks. Moreover, 
neither of the communicating parties authenticates the other. Assuming, how- 
ever, that the KRMs that can be negotiated are sound, the protocol does not 
introduce any vulnerabilities to the secrecy of the session key. The only attack 
that an adversary can mount against the protocol is to alter the Flello messages 
exchanged between the two entities in an attempt to downgrade the negotiated 
KRM(s) to one(s) that the attacker regards as weaker. Such an attack will only 
force the two entities to make use of less favourable mechanisms. 

To avoid such problems we propose enhancing the previously proposed pro- 
tocol. These enhancements provide the following security services. 

— Integrity of the exchanged messages. 

— Assurance that the Flello messages exchanged are not a replay from a pre- 
vious session. 

Additionally, the mechanism provides mutual authentication of the commu- 
nicating parties. Note, however, that mutual authentication is not a requirement 
for the negotiation protocol. It is a property derived from the use of digital sig- 
natures. The modifications proposed are as follows. 

— The client generates a random value randC, which he sends to the server 
with the ClientHello message. 

— The server generates a random value randS, which he sends to the client 
with the ServerHello message. 

— The client’s Finish message becomes 

Sc{ClientHello || ServerHello || randC || randS) 
and the server’s Finish message becomes 

Ss {Server Hello || ClientHello || randS || randC) 
where Su{M) is U’s signature on data M and “||” denotes concatenation. 
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The rest of the messages remain as previously defined. On receipt of the re- 
spective Finish messages the two entities check the signatures and if either of the 
two verification checks fails the protocol terminates unsuccessfully (this indicates 
that at least one of the ClientHello, ServerHello might have been altered during 
transmission). The modified protocol deals with the threat of modification of 
the exchanged Flello messages by an adversary. The generated random values 
prevent against replay attacks, i.e. where an adversary uses old exchanged mes- 
sages to subvert the protocol. The cost of this countermeasure, however, is the 
introduction of signatures which have to be supported by an appropriate public 
key infrastructure. In practice the two variants could co-exist, and an extra field 
in the Hello messages could be used to indicate which variants of the protocol 
are supported. 



6 Properties and Discussion 

The proposed protocol offers the communicating parties a means of negotiating 
the KRMs to be used for their encrypted communications. Given that this ne- 
gotiation will affect the selection of the key establishment protocol, execution of 
the proposed protocol must take place before the establishment of any session 
keys. It might also be the case that the two entities are obliged to use a specific 
key establishment protocol. This will simply restrict the number of mechanisms 
that the two entities will be able to negotiate. 

The negotiating entities will be able to choose different KRMs as long as 
there are no conflicts between the underlying key establishment mechanisms. 
Therefore, the choice of the key recovery mechanism(s) will only be affected 
by the compatibility of the underlying key establishment protocols. If these are 
compatible, the two parties will be able to use the negotiated KRMs, overcom- 
ing efficiently any interoperability problems that the two parties would have 
otherwise faced. 

Finally, note that in order to achieve the degree of agreement needed, the 
KRM negotiation process and the KRMs to be negotiated need to be subject of 
a standardisation process of some type (e.g. via the IETF). This standard will 
need to include agreed identifiers for a large set of KRMs. 



7 Conclusions 

The introduction of a large number of KRMs and their use in encrypted com- 
munications is likely to lead to interoperability problems between KR-enabled 
encryption products. In this paper the factors that can cause interoperability 
problems have been identified. Following a different approach to the single model 
solution proposed by the Key Recovery Alliance, a protocol has been proposed 
that gives communicating entities the means to negotiate the KRMs to be used. 
The parties can make use of different, yet interoperable, KRMs matching their 
needs for the specific communication session. 
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Abstract. The key agreement protocol are either based on some com- 
putational infeasability, such as the calculus of the discrete logarithm in 
[1], or on theoretical impossibility under the assumption that Alice and 
Bob own specific devices such as quantum channel [2]. In this article, we 
propose a new key agreement protocol called CHIMERA which requires 
no specific device. This protocol is based on a generalization we propose 
of the reconciliation algorithm. This protocol is proved unconditionally 
secure. 



1 Introduction 

The security of cryptographic systems is based either on a computational in- 
feasability or an a theoretical impossibility. However, some cryptographic prob- 
lems have no known unconditionally secure solution. For example, the key agree- 
ment problem has computational secure solutions, as the Diffie-Hellman protocol 
[1], but no unconditional secure solution under the assumption that Alice and 
Bob has no specific equipment such as quantum channel, deep-space radio source, 
or satellite. 

Our work is inspired by these protocols and uses a generalized version of an 
interactive error-correcting algorithm proposed by C.H. Bennett and G. Brassard 
in [2]. This algorithm, called reconciliation, fits the parameter of the quantum 
channel, but is insecure for our protocol because of some properties of the se- 
quences we use. The first part of this paper is a presentation of the generalization 
of the reconciliation algorithm. 

The next part is a presentation of CHIMERA, which is a key agreement 
protocol with unconditional security. It uses information-theoretic algorithms 
such as generalized reconciliation and extended Huffman coding. 

In [3], U. Maurer gives a general description of key agreement protocols and 
the conditions a key agreement protocol must satisfy to be secure [4], [5]. We 
recall these conditions and prove that CHIMERA satisfy all this conditions if 
the value of a parameter of the protocol is in a given range. Next, we propose a 
particular value of this parameter in the given range to optimize the length of 
the key created by CHIMERA. 
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© Springer- Verlag Berlin Heidelberg 2001 




278 



C. Prissette 



2 Generalized Reconciliation 

2.1 Bennett and Brassard’s Reconciliation 

The reconciliation process is, as describe in [2], an iterative algorithm which 
destroy errors between two binary sequences A and B owned by Alice and Bob. 
The destruction of the errors is secure even if Eve listen the insecure channel 
used by Alice and Bob to perform reconciliation. The algorithm does not destroy 
all errors between the two sequences in one round, but it can be repeated several 
times to destroy statistically all the errors. The price to pay to obtain to identical 
sequence is the sacrifice of bits of the sequences and thus, the reduction of the 
length of the sequences. 

Here is the algorithmic description of one round of reconciliation : 

Alice and Bob cut their sequences A and B into subsequences of length k. For 
each sub-sequence (A^, . . . , from A and {Bi , . . . , Bi^k-i) from B, they 

send each other (on the public insecure channel) the parity of their sub-sequence. 

— If the parity of the sub-sequence (Aj, . . . , Ai+k-i) differs from the parity of 
the sub-sequence {Bi, . . . ,i?i+fc_i), Alice and Bob destroy their respective 
sub-sequences. 

— Else Alice and Bob destroy respectively Ai_|_fe_i and Bi^^-i, and keep 
{Ai , . . . , Ai_^f„_ 2 ^ and {Bi , . . . , 

The principle is simple : if the parities differ, then the sub-sequences differ, 
if Alice and bob destroy these sub-sequences, they destroy (at least) one error 
between the two sequences. 

On the other hand, if the parities are equal. This does not mean that the 
two sequences are equal. However Eve knows one bit of information about the 
subsequence : so, Alice and Bob destroy one bit from their subsequence. 

Obviously, the reconciliation works only if the sequences A abd b are close 
enough, and is secure only if Eve has no information about A and B before the 
reconciliation. For example, if she knows with certainty the value of one bit from 
A and B and if Alice and Bob use sub-sequences of length two, she learns from 
the parities of the sequences the whole sequences and so the bit kept if the parity 
are equals. 

2.2 Generalized Reconciliation 

Sometimes, in particular in CHIMERA, the parity of a sub-sequence reveals 
more information than the entropy of one bit of the subsequence. This happens, 
for example, when p(Aj = 0) < p{Ai = 1). 

The generalized reconciliation algorithm REC(k,n), which is as follows, let 
Alice and Bob sacrifice n symbols (instead of only one) of their sub-sequences 
of length k when the parities are equals. 

Alice and Bob cut their sequences A and B into subsequences of length k. For 
each sub-sequence {Ai , . . . , Ai+k-i) from A and {Bi, . . . , Bi^k-i) from B, they 
send each other (on the public insecure channel) the parity of their sub-sequence. 
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— If the parity of the sub-sequence (Aj, . . . , Aij^k-i) differs from the parity of 
the sub-sequence {Bi, . . . ,Bi+k-i), Alice and Bob destroy their respective 
sub-sequences. 

— Else Alice and Bob destroy respectively and and keep 

(A^, ■ • ■ 5 and {B^j • ■ • , n— i)- 

The principle is the same than in Bennett and Brassard reconciliation R(k,l) 

: if the parities differs, then the sub-sequence contain errors, so Alice and Bob 
destroy the sub-sequences. Otherwise, Alice and Bob destroy more information 
than the information revealed by the parities. 

The generalization of the reconciliation algorithm is very useful in our pro- 
tocol, called CHIMERA, which uses REC(3,2). Actually, in this protocol the 
sequences are biased but the entropy of two bits is always greater than the 
entropy of the parity of three bits. This property is proved in the section (7). 



3 Presentation of CHIMERA 

The CHIMERA is a key agreement protocol, we present it with some parame- 
ters which are optimal and insure its security. The choice of the values used in 
CHIMERA is explain in the study of the protocol which follows this presentation. 

The following protocol allows Alice and Bob to build a secret common quan- 
tity of length 128 bits. 

— Alice builds a binary sequence with the following properties : 

• |AM| = 2000000 

• Vi p(Af' = 1) = pb = A 

— Bob builds a binary sequence 5^®! with the following properties : 

• |R[°1 I = 2000000 

• Vi p(Rf ' = 1) = pt = A 

— Alice and Bob repeat 6 times the following reconciliation algorithm REC(3,2) 
on their respective sequences (We note and Alice and Bob’s se- 
quences after k rounds of reconciliation). 



1=0 



forall i such as (i < —2 auid 



if 



(©L, 



.A. 



[k] ^ _ 



b \1)) 









[fc-K] 



[k+1] 



^ A 
^ B 



I ^ l + l 



[k] 

[fc] 



i mod 3 = 0) 
then 



end if 
end forall 



Alice compresses the sequence A^®! with the extended Huffman code "H using 
11-tuples as symbols of the language. The resulting sequence is the key S. 
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— Bob compresses the sequence with the extended Huffman code Ti. using 
11-tuples as symbols of the language. The resulting sequence is the key S' . 

Alice and Bob have the same quantity S = S' oi length 128. 

4 Properties of Key Agreement Protocols 

In [3], U. Maurer gives the properties a key agreement have to satisfy. These 
properties come from [4] and [5] . They are conditions of soundness and security. 

Considering that Eve is passive, a key agreement protocol which creates 
binary sequences S and S' by exchanging between Alice and Bob t messages 
C\,. . . ,Ct must satisfy the three conditions 

— P[S ^ S'] ^ 0 : Alice and bob must obtain with a very high probability the 
same sequence. 

— H{S) « [S'! : the key must be very close to uniformly distributed. 

— I{S;C'Z) « 0 : Eve has no information about S, considering her initial 
knowledge Z and her eavesdropping of the insecure channel. 

Moreover, the goal of the key-agreement is to make the length of the key S as 
long as possible. 

The CHIMERA satisfied each of these properties. The proof that each prop- 
erty is satisfied is given in the three following sections of this paper. For each 
proof, we assume that the bias pt of the initial sequences and is in the 
range [0 : |), and we search the conditions on this parameter the CHIMERA 
have to respect to work and be sure. We also assume the reconciliation needs 
r round to create identical sequences and the extended Huffman code uses n- 
tuples. 

Then, under the conditions on pi, obtained in each proof we explain the choice 
of the values pb = ^, r = and n = 11. 

5 Proof of the Property P[S ^ 5"] ~ 0 

The proof of the property P[S yf S"] « 0 is based on the study of the distance 
evolution between Alice’s sequence and Bob’s sequence after i rounds 
of reconciliation. 



5.1 Definition : Normalized Distance 

The normalized distance d,N{A,B) between to sequences of bits A and B is 
defined as the ration between Hamming distance d,H{A,B) and the length |A| 
of the sequences. 

dniA, B) 

\A\ 



dN{A, B) 



( 1 ) 
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5.2 Initial Normalized Distance 

Let pb be the biased probability of the random generators. The initial normalized 
distance is a function of pb- The following table presents the four possible values 
of the couple with their occurrence probability. 



Table 1. Possible values of with occurence probability. 



If 






0 


0 




1 


1 


Pb 


0 


1 


Pb{l - Pb) 


1 


0 


Pb{l - Pb) 



In the two last cases and differs, sop(zl[°^ ^1°^) = 2p&(l— p&). This 
result can be extended to the whole sequences to obtain the average Hamming 
distance — pb)- So the initial normalized distance 

between and is : 









2ph(l -pb). 



(2) 



In CHIMERA, we set G [0 : |). So we have the following range for the 
initial normalized distance between S and S' which is a function of the bias of 
the random generators used to build and : 

d^(AM,i?[ol)G[0:i). (3) 



5.3 Evolution of the Normalized Distance djvCA.t*’!, 

Let c?Ar(A[^l, be the normalized distance between and after 

k rounds of reconciliation with the algorithm REC{3,2). The following ta- 
ble presents the 32 possible values of the two 3-tuples 

with their occurrence probability when the bits and 
are kept (i is a multiple of 3). 

The 16 first cases give = b\^\ which means that the reconciliation 
REC3 works and the distance reduces. At the opposite, the 16 last cases gives 
yf b\^\ the reconciliation REC(3,2) fails and the distance increases. 

The normalized distance after one more round of reconcil- 

iation REC(3,2) is a function of dAr(A[^l, It is given by the ratio between 
the probability of the 16 last cases and the probability of the 32 cases ( we set 
= ) : 

2(1 - dN)d% 

3 ’ 



dAr(A['=+il,R['=+^l) 



r2 



(4) 
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5.4 Limit of the Normalized Distance 



Proving that VdAr(A[°l, i?[°) G [0 : |),limr->.+oo = 0 is equivalent 

to prove P[S ^ S"] « 0. We do not consider the last computation of the protocol 
(the Huffman coding of the sequences S and S') because Alice and Bob obtain 
the same sequence after this compression if they have the same sequence before 
this compression. So we only have to prove the normalized distance between AI’’! 
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and to be equal to zero before the Huffman coding, i.e after the reconciliation 
rounds. 

The limits of dN{A^^\ are the roots of the equation 

2cP 

3(l-d)2 + (l-d)3- 
This equation can be re-write as : 

d{i-d){d-^r=o. (6) 

Obviously, the roots of this equation, and so the possible limits of the normalize 
distance between and after k rounds of reconciliation, are {0, 1}. 



hm S^\s,S')e {0,1,1}. 
-^+oo Z 



( 7 ) 



Let us consider now the case dN{A^^\ G [0 : ^) seen in (3) which is en- 
countered in CHIMERA and study the limit of the normalized distance between 
and for this initial range of value. In this range, the next inequality is 



true : 



v4](5,5')G [0: 7) 






2^’ 3(l-d)2 + (l-d)3 



< d. 



(8) 



So, re-writing the equation with the normalized distance evolution function 
(4), we have: 



W^°\S,S') G [0 : (9) 

For G [0 : |), the sequence {dAr(A[^+^l, R[^+^l)}fc>o is decreas- 

ing and bounded. So it is convergent and its limit is 0. 

VdAr(AM,RM) G [0 : i), lim dN{A^'"\ = 0. (10) 

2 k—¥-\-oo 



So after enough rounds, noted r, of reconciliation the normalized distance 
between and becomes as close to zero as wanted. This means that the 
sequences are equal, with a very high probability. 

Vd^(A[ol,R[ol) G [0 : i),Ve > 0, 3r, rM) < e. (11) 

Choosing e very close to 0, we can write : 

P[AM=rM]«0. (12) 



Obviously, the Huffman coding H does not change this result. We note 
"H(AM) and "H(i?M), the Huffman coding of AI’’! and RM respectively. So, 

P[-H(AM)=-H(rM)] «0. (13) 

As defined in CHIMERA, the sequences "H(AM) and TL{B^^^) are the keys 
and can be noted, in accordance with [3], S and S'. So, we have : 

P[S yf S'] « 0. 



(14) 
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6 Proof of the Property IS] ~ H{S) 

The proof of the property [S'! « H{S) is based on the evaluation of the normal- 
ized weight of the sequences and Sl’’! and on a property of the Huffman 
code. 

6.1 Definition : Normalized Weight 

The normalized weight wat(H) of the binary sequence A is defined as the ratio 
between Hamming weight and the length |H|. 

con{A)=‘^^. (15) 

Of course, the initial normalized weight of the sequences and is equal 
to Pb- 

6.2 Residual Normalized Weight 

We consider the residual normalized weight of the sequences Hf’’! and i.e. 
when the condition (P[S' yf S"] = 0) is satisfied. We note pk the probability of 
keeping a bit after r rounds of reconciliation. This probability, we will not eval- 
uate now, is function of the number of reconciliation rounds (each round divide 
by three, at least, the length of the sequences) and of the normalized distance 
of the sequences for each round of reconciliation (the closest the sequences are, 
the highest is the probability to keep a given bit). 

As we keep only identical bits and sacrifice a certain amount of bits for 
security, the following table presents the two values the ith bit of and 
can have, with the probability associated to each case. 



Table 3. Possibles values of and with occurrence probability 





bV 


p(Ar,sr) 


0 


0 


(1 -pbYpk 


1 


1 


PbPk 



Obviously, the normalized weight of (and R[’'1) at the end of the recon- 
ciliation is : 



c^w(AM) 



pjPk 

(1 - PbYpk + plpk 



pI 



{i-PbY + Pb 



2 • 



(16) 



This result is validated by simulations as one can see in the following graph 
representing a;Ar(AM) as a function of pb : 
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Fig. 1. This graph shows as a function of pt- The curve is given by the theory. 

The dots are simulation results 

Note that for pb > j, the simulation results are noisy because the residual 
length of the sequence becomes too small. So, we will avoid this range of value 
for the bias of the random generators used to build Alice and Bob’s sequences. 

6.3 Entropy of 'H(A.M) 

As WAr(AM) < i, the entropy of is not maximal [6]. However, the last stage 
of the protocol is the compression of the sequences with an extended Huffman 
code. It is well known that using big t-tuples as the symbols of the language 
improves the compression ratio. With big enough t-tuples, the compression ratio 
is near of the entropy of the sequence. Noting H, the extended Huffman code, 
we have : 

|H(AM)| (17) 

As "H(AM) is the sequence S, we can rewrite the preceding equation : 

H{S) « |5|. (18) 

7 Proof of the Property I{S; C*Z) ~ 0 

The proof of the property I{S]C*Z) k. 0 is based on the comparison of the 
amount of information revealed and sacrificed by the reconciliation algorithm. 
We will only study the cases in which bits are kept : when the bits are destroyed 
because they are different, the information that Eve can gather is useless. 
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Moreover, as Eve has no information about and we can forget Z 
and just prove that 

I(S;C*)^0. (19) 

7.1 Information Sacrificed by the Reconciliation 

Let us consider the reconciliation of the 3-tuples ^1%) from Alice’s 

sequence and from Bob’s sequence {i is a multiple of 3). When 

for a given 3-tuples one bit is kept, then 2 bits are destroyed. Moreover, the 
sacrificed bits are independent from each other. So, the amount of information 
sacrificed is 

Hs = (20) 

7.2 Information Revealed by the Reconciliation 

Now, let us consider the information revealed by the reconciliation, i.e. the parity 
of the 3-tuple {Af\ 

i7(C'f+i) = i7(0Afi-). (21) 

1=0 

The following table gives the probability of incidence of each case : 



Table 4. Possible values of (A^*’^ , with occurence probability. 





^i+i 


44) 


Wj=o ^i+j 


p(44 44.44) 


0 


0 


0 


0 


(l-w^(Al"0)" 


0 


1 


1 


0 


(1-w^(AW))w^(AW)2 


1 


0 


1 


0 


(1-w^(AW))w^(AW)2 


1 


1 


0 


0 


(1-w^(AW)V^(AW)2 


1 


0 


0 


1 


(l-Wiv(A‘"0)"^v(A‘''') 


0 


1 


0 


1 


{1 - 0JN{A^’^^)f0JN{A^A) 


0 


0 


1 


1 


(l-w^(AW))2wiv(AW) 


1 


1 


1 


1 


CUiv(AW)3 



From the four last cases, we have : 

J<2 

U)N (04^i ) = 3(1 - cuv(AW))2u;a,(A['=1) + (o;^(AW))3. (22) 

1=0 

Which give us, the entropy of the parity : 

J<2 

^(044) = i^(3(l -o.^(AW))^cuv(AW) + (u;a,(AW))3). 

1=0 



(23) 
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So, 

+1) = 77(3(1 - + (u;w(7lW))3). (24) 

7.3 Comparison between Hs and 

Obviously, we want th amount of information sacrificed to be greater than the 
amount of information revealed : 

Hs > 77(Cf +1). (25) 

With (20) and (24), it becomes 

2H{ojn{A^% > 77(3(1 - u;w(AW))2u;^(AW) + (o.^(AW))3), (26) 

The following graph shows 77(C^*+^) and Hs as functions of pb- 




Fig. 2. This graph shows and Hs as functions oipb. For pb > the amount 

of information revealed is lesser than the amount of information sacrihced 



This inequality is true for ujn{A^^^) € To insure the security of the 

protocol, the inequality must be true for each round of the reconciliation : 

Vfc<rc.^(AW)G[l:i]. 



(27) 
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As {wAf(A[''l)}o<fc<r is decreasing and u;Ar(AM) < |, we just have to prove that 
the normalized weight of the residual sequence after the reconciliation is 
greater than A 

> 1. (28) 

Using (16), this inequality becomes 



pI > J_ 

(l-Pb)2+p2 - 20 ' 



(29) 



So, the reconciliation algorithm REC(3,2) is secure if 



Pb > 



yi9- 1 

18 



(30) 



It means that eve gather no information from the communications (7* between 
Alice and Bob if the initial normalized weight of the sequences is in the range 
: ^]. Under this condition, we have : 

I{S;C*)^0. (31) 



Moreover, as Eve has no initial sequence Z, we can write : 

I{S; C^Z) « 0. 



(32) 



8 Choice of the Parameter pi, 

8.1 Constraints on the Choice of pb 

The bias of the random generators used to build and is the most impor- 
tant parameter of CHIMERA, as the security and the efficiency of the protocol 
depend on the value of pf,. 

As seen in the proof of the property [S’! « H{S), the bias pb should not be 
greater than | to be efficient. Moreover, the proof of the property I{S] C*Z) « 0 
stands that CHIMERA is safe if pb is greater than . So, the bias of the 

random generators must be choose in the range : |]- 

8.2 Simulation Results 

We have made simulations with sequences and of length 2 • 10® bits. 
The bias of the random generators is set in the range [0 : |) (although only the 
range • i] is really useful in CHIMERA ) and the reconciliation round is 

repeated while Alice and Bob’s sequences are different. 

Then, we have consider the residual length of the sequences weighted by the 
entropy of the normalized weight of the sequences, i.e. the length of the sequences 
compressed with an optimal compression code (like the extended Huffman code). 
The results of these simulations are presented in the following graph. The x-axis 
is the bias pb and the y-axis is the residual length [S'!. 
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Fig. 3. This graph shows the residual length |S'| as a function of pb- The value ph we 
propose is 0.1875 

As stands in [3], the goal is to make [S'! as large as possible. In the range 
■ i]> clouds of points ; the first one, located in 

0.22], re-groups the results of the simulations with six rounds of reconciliation. 
The other cloud of points re-groups the results of the simulations with seven 
rounds of reconciliation. 

As one can see, in the range 0.22] the residual length jS'j is greater 

than the length jS'j in the range 0.22 : |]. Moreover, in the first range six 
rounds of reconciliation, instead of seven rounds, are needed. So we have to chose 
:«0.22j. 

Moreover, as the first cloud decreases with p^, the bias of the random gener- 
ator should be close to . For implementation convenience, we propose to 

use : 

Pb=Y^- ( 33 ) 

8.3 Creation of a Biased Random Generator for pf, — ^ 

The bias pb can be easily obtain with a combination of non-biased random gen- 
erators. For example, considering the outputs a, b, c and d of four non-biased 
random generators, the logical combination 

p = a- b- c+a-b-d^. 

^ ■ denotes the logical operator AND, -I- denotes the logical operator OR 




(34) 
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is a biased random generator of bias pb = fg. 

A such simple construction can be implement in any environment and let 
Alice and Bob build their initial sequences with very light calculus. As the other 
parts of the protocol need very light calculations (XOR and Huffman coding 
with pre-calculated trees), our intend is to make the creation of the sequence as 
easy as the rest of the protocol. 

9 Parameter of the Extended Huffman Code 

The efficiency of the Huffman code depends on the number of symbols of the 
language on which is based the Huffman tree. For example if only two symbols 
appears, whatever their frequencies, the Huffman tree will be a simple root. But, 
if you consider n-tuples of symbols as the symbols of a language, the Huffman 
code become more and more efficient as n increases. The compression ratio is, 
of course, bounded by the entropy of the language. 

For the last stage of CHIMERA, we have to find a size of the n-tuples such 
as a 128 bit key created with CHIMERA as at least 127 bits of entropy. The 
method to find n is simple : we calculate the minimum-redundancy code for an 
increasing n, with the algorithm presented in [7] until we found a compression 
ration TZn such as : 

128 • > 127. (35) 

The following table present, for a given n, the compression ratio of the 
minimal-redundancy code obtained with n-tuples as symbols, and the entropy 
of a 128 bits key created with this minimal-redundancy code: 



Table 5. Compression ratio and entropy of the key for a given length of the extended 
Huffman code. 



n 


77n 


H(S) 


1 


1 


36.9 


2 


0.5745 


64.3 


3 


0.4347 


85.1 


4 


0.3685 


100.2 


5 


0.3378 


109.4 


6 


0.3179 


116.3 


7 


0.3056 


121.0 


8 


0.3007 


122.9 


9 


0.2971 


124.4 


10 


0.2936 


125.8 


11 


0.2905 


127.2 



The compression ratio for n = 11 is close enough to entropy of « 

0.28878 to obtain a key with an entropy greater than 127. 
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Considering bigger n-tuples, one has a better approximation of the entropy. 
Nevertheless, the Huffman tree need more memory, with 11-tuples, the compres- 
sion table (i.e. the Huffman tree) needs 

10 Average Length of the Keys 

The length of the keys can be easily calculated knowing the length As we 

need empirically 6 rounds of reconciliations to have P{S yf S') « 0, we set r = 6 
for pt, = 



10.1 Residual Length 

The amount of bits kept after a reconciliation round is a function of the nor- 
malized distance between the sequences : the closer the sequences are, the fewer 
3-tuples are destroyed. 

As one bit is kept when the 3-tuples have the same parity, and none if the par- 
ities differ, noting R{dN the reduction factor of the sequence, 
we have (with i multiple of 3): 






^((©■=o4^) = (©,Lo<’,)) 



(36) 



Considering the 3-tuples with the same parity, the table in the section (5.3) 
gives, setting cIn = : 



R{dN) = 



(1 — d]\[)^ + 3(1 — di\[)d'j^ 



(37) 



As the reconciliation is an iterative process, the length is reduced six 

times, with a ratio depending on the normalized distance between Alice and 
Bob’s before each round of reconciliation REC(3,2). so, the length is : 



5 

|^[6]| = |H[°l|]^R(dAr(AW,HW)). (38) 

i=0 

Of course, <iAr(A[^l, is given for each iteration by (4). 



10.2 Length of the Key S 

At the end of the reconciliation, Alice and Bob own respectively the sequences 
and of length k = |A[®1| and of normalized weight W7 v(A[® 1). The 
normalized weight is given by (16) : 



u;Ar(A[6l) 



(u;v(Alol))2 

(l-u;w(A[0l))2 + (u;^,(A[0l))2- 



(39) 
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These sequences equal with a very high probability are compressed at the 
end of the protocol with an extended Huffman code which compression ratio is 
very close to the entropy of the sequence. Thus, the length of the key is : 

\S\ = H{con{A^^^))-\A^% (40) 

From (38) and (39), we have : 

With the extended Huffman code of length n = 11, the practical length of 
the keys is : 

5 

1^1 =7^n|H[0l|^^(c^w(^''=^B''=^)). (42) 

i^O 

The evaluation of this formula gives: 

IS”! « 6.37- 10-®|H[°1|. (43) 

So Alice and Bob can create a common key of 128 bits with initial sequences 

of length 2000000 bits. 

11 Conclusion 

The main points addressed in this paper are : 

— A generalized definition of reconciliation has been proposed to let the users 
destroy more than one symbol of their sequences. The generalization is useful 
when the entropy of the reconciled sequences is not maximal. 

— A unconditionally secure key agreement protocol, called CHIMERA, has 
been proposed. Its soundness and its security has been proved. The 
CHIMERA uses no specific devices unlike other unconditionally secure key 
agreement protocol. 

— Convenient parameters has been given for practical implementation of the 
CHIMERA. 
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Abstract. Encryption in wireless networks is essential to protect sensitive 
information and to prevent fraud. Eurthermore, wireless devices, such as 
palmtops, bluetooth devices, and mobiles phones require an efficient encryption 
algorithm which is secure and small, fast and simple to implement. There are 
several stream ciphers available with the best known being A5/1 used in GSM 
and RC4 used in 802.11 standards. However, these stream ciphers are weak 
against cryptographic attacks [1] [2] [3] [4] [5] [6]. In this paper, a new 
synchronous stream cipher (Alpha 1) is proposed. Alpha 1 is robust against 
these attacks, small, fast and very simple to implement in small wireless devices 
with low processing power and size (i.e. bluetooth, mobile phones, access 
points etc.). 



1 Introduction 

Stream ciphers can be designed to be exceptionally fast, much faster in fact than any 
block cipher. While block ciphers operate on large blocks of data, stream ciphers 
typically operate on smaller units of plaintext, usually bits. A stream cipher generates 
a keystream and encryption is provided by combining the keystream with the 
plaintext, usually with the bitwise XOR operation. The generation of the keystream 
can be independent of the plaintext and ciphertext (yielding what is termed a 
synchronous stream cipher) or it can depend on the data and its encryption (in which 
case the stream cipher is said to be self-synchronizing). The majority of the 
encryption algorithms that protect the air interface in mobile and wireless networks 
(i.e. GSM, bluetooth etc.) use synchronous stream ciphers build on linear feedback 
shift registers. 

A Linear Feedback Shift Register (LFSR) is a mechanism for generating a 
sequence of binary bits. The register consists of a series of cells that are set by an 
initialisation vector that is, most often, the secret key. The behaviour of the register is 
regulated by a clock and at each clocking instant, the contents of the cells of the 
register are shifted right/left by one position, and the XOR of a subset of the cell 
contents is placed in the leftmost/rightmost cell. One bit of output is usually derived 
during this update procedure. LFSRs are fast and easy to implement in both hardware 
and software. With a judicious choice of feedback taps the sequences that are 
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generated can have a good statistical appearance. However, LFSRs are useful as 
building blocks in more secure systems (i.e. A5/1 in GSM) but still suffer from two 
resent cryptographic attacks as described by Biryukov and Shamir, the biased 
birthday attack and the random subgraph attack. 

The main idea of the biased birthday attack [1] is to consider sets A and B from the 
LFSRs which are not chosen with uniform probability distribution among all the 
possible states. The observation which makes this attack efficient is that in LFSR- 
based stream ciphers there is a huge variance in the weights of various states that 
begin with a specific a -bit pattern. The register bits that affect clock control and the 
register bits that affect the output are unrelated for about a clock cycles because of 
the single clock control tap. This decreases the states to be sampled to 2" • 2 ° = 2" ° . 
It was also found in A5/1 that the weight of a large percentage of the states was zero 
because their trees died out before reaching depth 100. As a result, efficient 
determination of initial states can be obtained since the exact location of a and the 
depth of the initial state is known. This is made possible by the frequent initialisation 
and poor choice of the clocking taps. 

The main idea of the random subgraph attack is to make most of the special states 
accessible by simple computations from the subset of special states which are actually 
stored in the hard disk [1]. The attack is based on a function f that maps special 

states in an easy computable way and Heilman’s time-memory tradeoff, m4t = |U| , 
for block ciphers described in [4] can be used to invert it easily. In A5/1 the attack is 
applied to the subspace of 2'”* by the fact that it can be efficiently sampled. The time 
trade-off formula of the random subgraph attack results to M = 2^*^ and just T = 2^ . 
This number of steps can be carried out in several minutes on a fast PC. Therefore, it 
is essential to create a stream cipher which is robust against these attacks, small in 
size and very simple to implement in hardware. 



2 Description 

In general, Alphal is a synchronous clock controlled cipher and operates by 
expanding a short key into an infinite pseudo-random key stream. The sender uses a 
boolean exclusive-OR (XOR) gate, to XOR the key stream with the plaintext to 
produce ciphertext. The receiver has a copy of the same key, and uses it to generate 
identical key stream. XORing the key stream with the ciphertext yields the original 
plaintext. The key stream can be generated based on a combination of the system 
specifications that is used. 

The cipher uses four linear feedback shift registers (LFSRs) of length 29, 31, 33, 
and 35 denoted by Rl, R2, R3, and R4 respectively. The four registers are maximum 
length of LFSRs with maximum periods 2^^ - 1, 2^‘ - 1, 2” - 1, and 2’^ - 1 
respectively. When R2 and R3 registers are clocked their output are combined using 
a boolean AND gate; the output of this AND gate is then an input to a boolean 
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exclusive-OR (XOR) gate, together with the outputs of R1 to R4, to produce the final 
output stream (Figure 1). The AND gate was added to the cipher to increase the 
linear complexity and avoid attacks based on linear statistical weakness [3]. 
Moreover, several complicated combinations of AND gates has been designed and 
tested in Alphal to increase mathematically the linear complexity. However, it is not 
recommended in real systems because the stream cipher becomes slow and 
impractical in mobile and wireless devices. 




Fig. 1. Alphal Stream Cipher 



The total length of the registers is 128 and the feedback polynomials used are all 
primitive. The primitive polynomials have been generated by computer software 
based on [7]. The Hamming weight of all the feedback polynomials is chosen to be 
five for Rl, R2, and R3, and six for R4 (see Table 1); this represents a reasonable 
trade-off between reducing the number of required XOR gates in the hardware 
realization and obtaining good statistical properties for the generated sequences. 



Table 1. Primitive Polynomials and Hamming Weight 





Polynomials fj{x) 


Weight 


Rl 




5 


R2 


X X X X +1 


5 


R3 


+1 


5 


R4 


x^^ + x^ + x^^ + x'' + x” +1 


6 
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Let x^,x^,x^, and x^ denote the element of the polynomial with highest degree in 
Rl, R2, R3, and R4 respectively. The output y of Alphal is given by the following 
equation: 



y =Xj^® x^®x^® x^® x^- x^ 



( 1 ) 



If L^,L^,L^, and denote the highest degree of the primitive polynomials in Rl, 
R2, R3, and R4 respectively then, according to (1), the linear complexity of Alphal is 
given by [7]: 



L = Lj + L, + L, + L, + L,L,=29 + 31 + 33 + 35 + 1023=1151 (2) 



The linear complexity is in practice greater than 1151 because of the clock control 
mechanism used in Alphal. In the clock control mechanism of each register, apart 
from Rl, there are two clocking taps: bits 10, 21 for R2, bits 10, 22 for R3 and bits 
1 1, 24 for R4. The clocking taps, which control the operation of Alphal, divide each 
of the three registers into three, almost equal, parts. Every clocking tap per register is 
checked in a triangular scheme; the bits 10, 22, and 11 (blue colour) are checked with 
bits 21, 10, and 24 (green colour) of R2, R3, and R4 respectively, as shown in Figure 
2 . 




Clocking Taps in R2 



Clocking Taps in R3 



Clocking Taps in R4 



Fig. 2. Clock Control Mechanism of Alphal 



For each clock cycle, the registers whose clocking taps agree with the majority bit 
in the triangle are shifted. Note here that Rl is shifted in every clock cycle. For 
example, if B, C, and D represent the clocking taps of R2, R3 and R4 respectively, 
then Table 2 shows all the combinations for shifting. 
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Table 2. Shift per Clock Register 



One Clock 
Cycle 


Majority Bit Of 
10, 22, 11 = B, C, D 


Majority Bit Of 
21, 10, 24 = B, C, D 


Candidates 
For Shift 2 


Case 1 


B=C 


B=C 


R2, R3 


Case 2 


B=D 


B=C 


R2 


Case 3 


C=D 


B=C 


R3 


Case 4 


B=C 


B=D 


R2 


Case 5 


B=D 


B=D 


R2, R4 


Case 6 


C=D 


B=D 


R4 


Case 7 


B=C 


C=D 


R3 


Case 8 


B=D 


C=D 


R4 


Case 9 


C=D 


C=D 


R3, R4 


Case 10 


B=C 


B=C=D 


R2, R3 


Case 11 


B=D 


B=C=D 


R2, R4 


Case 12 


C=D 


B=C=D 


R3, R4 


Case 13 


B=C=D 


B=C=D 


R2, R3, R4 



For instance in case one, if the clocking taps of R2 in positions 10 and 21 agree 
with the clocking taps of R3 in positions 22 and 10 respectively then only R2 and R3 
are shifted. Likewise, in case 2 if the clocking tap of R2 in positions 10 agrees with 
the clocking tap of R4 in position 1 1 and the clocking tap of R2 in position 21 agrees 
with the clocking tap of R3 in position 10 then only R2 is shifted. Finally, note that at 
each clock cycle R1 and/or R2 or/and R3 or/and R4 registers are shifted. Registers 
R2, R3 and R4 move with probability s . 



3 Cryptanalysis of Alphal 

As mentioned in the introduction section, stream ciphers, such as A5/1, suffer from 
two recent cryptographic attacks, the biased birthday attack and the random subgraph 
attack, as described by Biryukov, Shamir and Wagner in the Fast Software Encryption 
Workshop 2000. The poor initialisation and choice of the clocking taps in LFSR- 
based stream ciphers make both attacks applicable by identifying special states in the 
registers. 

The clock control mechanism in Alphal was designed in such way that the register 
bits that affect the clock control and the register bits that affect the output are 
unrelated for only 10 clock cycles. Therefore, in the biased birthday attack there is no 
variance in the weights of various states that begin with a 10-bit pattern. As a result 
the 10-bit pattern reduces the number of states that required to be sampled from 2 ™ 
to 2 ™ *2“* = 2“* . In this case the biased birthday attack becomes very inefficient 
which results to an impractical random subgraph attack. 
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The random subgraph attack is applied to the subspace of regardless if it can 
be efficiently sampled or not. The time trade-off formula of the random subgraph 
attack results to M = 2™ and just T = 2“* . This number of steps makes random 
subgraph attack impossible. 



4 Implementation 

Alphal has been implemented on a complex programmable logic device (CPLD) 
using Vhsic Hardware Description Language (VHDL). The XC95216 Xilinx chip 
was used to implement Alphal stream cipher. XC95216 was the smallest CPLD chip 
found in Xilinx. It has 216 macrocells and 216 registers (about 4,800 gates) out of 
which 131 macrocells (60%) and 128 registers (59%) are used for Alphal. Thus, 
Aphal requires about 2,911 gates. The maximum clock speed of XC95216 is about 
33.3MHz. 




(b) 



(c) 



Fig. 3. External / Internal Stmcture of XC95216 (CPLD) Chip for Alphal Stream Cipher 

In Figure 3, the external and internal structure of XC95216 chip is shown. In 
particularly, the chip is illustrated in Figure 3a. The registers are shown in Figure 3b 
and finally the logical gates are illustrated in Figure 3c. 
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5 Conclusion 

There are a vast number of stream ciphers that have been proposed in cryptographic 
literature as well as an equally vast number that appear in implementations and 
products worldwide. Many are based on the use of LFSRs since such ciphers tend to 
be more amenable to analysis and it is easier to assess the security that they offer. 
Alphal, which is similar to A5/1, has been designed to increase the security in LFSR- 
based stream ciphers used in mobile and wireless devices. High linear complexity, 
ease of implement and small in size are the main advantages of Alphal. Moreover, 
the operation of Alphal has been designed to prevent special states being identified 
by their output sequences [1]; this is achieved by the clock control mechanism 
described which ensures that the register bits that affect the clock control and the 
register bits that affect the output are unrelated for only 6 clock cycles. This makes 
the biased birthday attack and the random subgraph attack [1] inefficient. Moreover, 
Anderson and Roe [6] have proposed an attack on A5/1 that uses only 3 LFSRs, based 
on guessing the bits of the shorter registers, and deriving the other bits of the longer 
register from the output. However, they have to guess the clocking sequence and the 
complexity of the attack is about 2''“ in A5/1. Such an attack on Alphal has 
complexity greater than > 2^’ because of the number of states n = 2‘^* . Moreover, 
the high linear complexity of Alphal makes attack in [2] impractical because it is 
based on a solution of a system with linear equations. 
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Abstract. The paper considers additive representation of the number of linear 
codes. The summands of the representation are numbers of linear codes of 
weight 1, 2, ... . Formulae for calculating the summands with respect to weight 
1 and weight 2 are stated. The contribution of the summands mentioned to the 
whole sum is estimated. The property of strong logarithmic concavity of the 
number of linear codes of minimal weight is established. 



1 Introduction 

Let GF(q) be a finite field consisted of q elements. An arbitrary k-dimantional 
subspace of an n-dimentional vector space over the field GF(q) will be called [n, k] 
linear code over the field GF(q) or [n, k]-code. Denote by G(n, k) the number of all 
[n, k]-codes different in pairs. To know the number of [n, k]-codes, which possess 
some extra properties, is of interest for the problem of information coding [3]. The 
weight of a linear code is considered to be such a property. (Recall that by the weight 
of a linear code we mean the minimal weight of a nonzero vector that belongs to the 
linear code. In its turn, by the weight of an n-dimentional vector we mean the number 
of its nonzero components.) Let us denote by G(n, k, fD) the number of all different in 
pairs [n, k]-codes each of which has weight C5, 05 = 1, 2, ..., n. Then the number G(n, 
k) can be obviously represented in the following form 

G(n, k) = G(n, k, 1) H- G(n, k, 2) H- - . (1) 

The paper is devoted to the statement of the properties of the summands G(n, k, 1) 
and G(n, k, 2), and to the estimation of their contribution to the right-hand side of (1). 



2 Formulae for G(n, k, 1) and G(n, k, 2) and Their Applications 



Theorem 1. For integers 1 < k < n we have 

G(n, k, 1) = G(n-1, k, 1) H- G(n-1, k-1, 1) (B(m) - 1) -i- G(n-1, k-1) , (2) 

where B(a) = exp{a In q}, asR, m = n-k, G(d, p, 1) = 0 if either p = 0 or p > d. 
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Theorem 2. For integers 1 < k < n we have 
G(n, k, 2) = G(n-1, k, 2) + G(n-1, k-1, 2) (B(m) - 1) + 

+ {G(n-1, k-1) -G(n-1, k-1, 1) -G(n-1, k-1, 2)} (n-1) (q-1) , 

where G(d, p, 2) = 0 if either p = 0 or p > d. 

Theorem 3. Condition B(m) - (q-1) m - 1 < (q-1) k, where 1 < k < n, is necessary 
and sufficient for the following relation to he valid: 

G(n, k) = G(n, k, 1) + G(n, k, 2) , 

where G(n, k, 1) and G(n, k, 2) are found hy means of (2) and (3) respectively. 

Denote by P(n, k) = G(n, k, 1)/G(n, k), 1 < k < n. 

Theorem 4. If m = o(ln k), n — ^ oo, then P(n, k) ^1, n ^ oo; if k = x B(m + o(l)) 
as n^ oo, where x is a chosen positive number, then P(n, k) — ^ 1 - exp{-x}, n ^ oo; 
if, finally, k = o(B(m)), n — ^ then P(n, k) ^ 0, n ^ oo. 

Theorem 5. If 



k = y(B(m)-(q-l)m), 

where y is a chosen number, and 1 < (q-1) y < °°, then G(n, k, 2)/G(n, k) exp{-y} 

as n — ^ oo. 

The proofs of theorems 1, 3-5 are given in [4], Similary by (2) we can verify 
relation (3). 

The main problem of the theory of linear codes is to construct a code possessing 
the greatest weight 05 at chosen parameters n and k. From the point of view of the 
advancement in solving this problem Theorem 4 can be interpreted as follows. If n 
and k differ insignificantly, for instance by a quantity not exceeding o(ln k), i.e., if 
n-k « In k, then, asymptotically as n — ^ oo, all [n, k] -codes have weight 1. If n differs 
from k by a quantity of order of In k, then we have exponential behavior of the 
number G(n, k, 1)/G(n, k,) as n ^ oo. Finally, if n » k H- Ink, then asymptotically as 
n^ oo, all [n, k]-codes have weight at least 2. Theorem 5 admits analogous 
interpretation. 



3 Property of Strong Logarithmic Concavity 



Theorem 6 states that the numbers G(nH-l, k, 1), 1 < k < n-tl, n = 0, 1, 2,..., satisfy the 
property of strong logarithmic concavity. Note that Gaussian coefficients G(n, k), 
Stirling numbers of the first and second kind, etc. satisfy this property as well (see, 
for example, [1], [2], etc.). 

Theorem 6. For integers 1 < k < n-tl, n = 0, 1, 2, ... 

2 In G(n-hl, k, 1) > In G(nH-l, k-1, 1) G(n-hl, k-hl, 1) , (4) 

where G(d, p, 1) = 0 if either p = 0 or p > d. 

We preface the proof of Theorem 6 by the following lemmas. 
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3.1 Auxiliary Assertions 

Lemma 1. For integers l<k<n, n=l,2, ... 

G(n, k+ 1 ) G(n, k, 1 ) < G(n, k) G(n, k+ 1 , 1 ) . (5) 

Proof. Using the well-known equality 

G(n, k) = (B(n) - 1) - (B(m-tl) - l)/(B(k) - 1) - (B(l) - 1) (6) 

we can rewrite (5) as 

(B(m) - 1) G(n, k, 1) < (B(kH-l) - 1) G(n, k+l, 1) . (7) 

Relation (7) is obvious at n = 1. The induction step from n-1 to n, n > 2, can be 
proved as follows. Lets consider 1 < k < n-1 (if k = n then (7) is obvious). Applying 
Theorem 1 to relation (6) and using the induction assumption we obtain 
(B(m-l) - 1) (B(k) -t {G(n-1, k-l)/G(n-l, k, 1)}) < 

< (B(m-l) - 1) B(kH-l) H- (B(kH-l) - 1) G(n-1, k)/G(n-l, k, 1)) . 

It follows from relation (6) and estimate G(n-1, k, 1) < G(n-1, k) that inequality 
(8) holds true. This completes the proof of Lemma 1. 

The following Lemmas 2-5 can be established similary, however it requires more 
computations to set their proofs in. 

Lemma 2. For integers l<k<n, n=l, 2, ... 

G(n, k-1) G(n, k-t2, 1) <G(n, k) G(n, k+l, 1) . 

Lemma 3. For integers l<k<n, n=l, 2, ... 

(B(m+1) - 1) G(n, k+l) G(n, k-1, 1) < (B(m) - 1) G(n, k) G(n, k, 1) . 

Lemma 4. For integers l<k<n, n=l, 2, ... 

(B(m-l) - 1) G(n, k-1) G(n, k+l, 1) < (B(m) - 1) G(n, k) G(n, k, 1) . 

Lemma 5. For integers l<k<n, n=l,2, ... 

(B(m+2) - 1) G(n, k-2, 1) G(n, k+l, 1) < 

< (B(m+1) - 1) G(n, k-1, 1) G(n, k, 1) . 

3.2 Proof of Theorem 6 

If n = 0 then relation (4) is obvious. Let us prove the induction step from n-1 to n, n > 
1. Using Theorem 1 we can represent left-hand and right-hand sides of relation (4) in 
the form 

GWl.k, 1) = D, + ... +D,, (9) 



where 



Dj = G'(n, k, 1) , 
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D, = (G(n, k-1, 1) (B(m+1) - 1))" , 

D, = G'(n, k-1) , 

D, = 2 G(n, k, 1) G(n, k-1, 1) (B(m+1) - 1) , 

D, = 2 G(n, k, 1) G(n, k-1) , 

D, = 2 G(n, k-1, 1) G(n, k-1) (B(m+1) - 1) , 

and 

G(n+1, k-1, 1) G(n+1, k+1, 1) = + ... + F,, (10) 

where 

Fj = G(n, k-1, l)G(n, k+1, 1), 

F, = G(n, k-1, 1) G(n, k, 1) (B(m) - 1) , 

Fj = G(n, k-1, 1) G(n, k) , 

F, = G(n, k-2, 1) G(n, k+1, 1) (B(m+2) - 1) , 

F, = G(n, k-2, 1) G(n, k, 1) (B(m+2) - 1) (B(m) - 1) , 

F, = G(n, k-2, 1) G(n, k) (B(m+2) - 1) , 

F, = G(n, k-2) G(n, k+1, 1) , 

F, = G(n, k-2) G(n, k, 1) (B(m) - 1) , 

F, = G(n, k-2) G(n, k) . 

By virtue of the induction assumption we have 

D,>F,, (11) 

and, taking into consideration the inequality 

(B(m+1) - 1)' > (B(m+2) - 1) (B(m) - 1)) , 

D,>F,. (12) 

Using the property of strong logarithmic concavity of Gaussian coefficients G(n, k) 
we get 

D3>F,. (13) 

It follows from Lemma 5 that 



D > R + F, . 



(14) 
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Lemmas 1 and 2 imply 



D5>F, + F,. (15) 

And, finally, Lemmas 3 and 4 yield 

D,>F, + F,. (16) 

Relations (9)-(16) prove (4). This completes the proof of Theorem 6. 



4 Problems and Examples 

For 05 = q = 2 and k < n < 10 the numbers G(n, k, 2) are given in Table 1 and Table 2. 
(The numbers in the tables are found by means of relations (2), (3) and (6).) 

Table 1. Numbers G(n, k, 2) for 2 < n < 6 



_k 

1 

2 

3 

4 

5 

6 

7 



n 



8 



9 

10 



1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 



3 

1 

0 

0 

0 

0 

0 

0 

0 

0 



4 5 ^ 

6 10 15 

13 75 305 

1 40 640 

0 1 121 

0 0 1 

0 0 0 

0 0 0 

0 0 0 

0 0 0 

0 0 0 



Table 2. Numbers G(n, k, 2) for 7 < n < 10 



k 


n 7 


8 


9 


10 


1 


21 


28 


36 


45 


2 


1022 


3038 


8346 


21720 


3 


6265 


46522 


292068 


1647480 


4 


4781 


109991 


1800213 


23830197 


5 


364 


34041 


1786866 


64127334 


6 


1 


1093 


239380 


27853180 


7 


0 


1 


3280 


1678940 


8 


0 


0 


1 


9841 


9 


0 


0 


0 


1 




0 


0 


0 


0 



Making use of Table land Table 2 it is easy to show that for 05 = q = 2 and k < n < 
10 numbers G(n, k, 2) satisfy the property of strong logarithmic concavity. 

The question about whether the numbers G(n, k, 05), 05 > 2, satisfy the property of 
strong logarithmic concavity is still open. 





306 



V. Masol 



References 

1. Kurts, D.C.: A Note on Concavity Properties of Triangular Arrays of Numbers. J. Comb. 
Theory, Vol. 13 (1972) 135-139 

2. Lieb, E.H.: Concavity Properties and a Generating Function for Stirling Numbers. J. Comb. 
Theory, Vol. 5 (1968) 203-206 

3. MacWilliams, F.J., Sloane, N.J.A.: The Theory of Frror-correcting Codes (1977) 

4. Masol, V.I.: Asymptotic Behaviour of the Number of Certain k-dimantional Subspaces over 
a Finite Field. Mathematical Notes, Vol. 59 (1996) 525-530 




Statistical Physics of Low Density Parity Check 
Error Correcting Codes 



David Saad^, Yoshiyuki Kabashima^, Tatsuto Murayama^ , and 
Renato Vicente^ 

^ Neural Computing Research Group, Aston University, Birmingham B4 7ET, UK. 
^ Dept, of Comp. Intel. & Syst. Sci., Tokyo Institute of Technology, Yokohama 

2268502, Japan. 

® Dep. de Fisica Geral, Institute de FIsica, Universidade de Sao Paulo, 

Caixa Postal 66318, 05315-970 Sao Paulo - SP, Brazil. 



Abstract. We study the performance of Low Density Parity Check 
(LDPC) error-correcting codes using the methods of statistical physics. 

LDPC codes are based on the generation of codewords using Boolean 
sums of the original message bits by employing two randomly- 
constructed sparse matrices. These codes can be mapped onto Ising spin 
models and studied using common methods of statistical physics. We ex- 
amine various regular constructions and obtain insight into their theoret- 
ical and practical limitations. We also briefly report on results obtained 
for irregular code constructions, for codes with non-binary alphabet, and 
on how a finite system size effects the error probability. 

1 Introduction 

Modern telecommunication relies heavily on error correcting mechanisms to com- 
pensate for corruption due to noise during transmission. The information trans- 
mission code rate, measured in the fraction of informative transmitted bits, plays 
a crucial role in determining the speed of communication channels. Rigorous 
bounds [1] have been derived for the maximal code rate for which codes, capable 
of achieving arbitrarily small error probability, can be found. However, these 
bounds are not constructive and most existing practical error-correcting codes 
are far from saturating them. 

Two code families currently achieve the highest information transmission 
rates for a given corruption level, especially in the high code rate regime. Turbo 
codes [2] have been introduced less than a decade ago, and were followed by the 
rediscovery of Low Density Parity Check Codes (LPDC) [3]. The latter have been 
originally introduced by Gallager [4] in 1962, and abandoned in favour of other 
codes due to the limited computing facilities of the time. Both codes show excel- 
lent performance and recently discovered irregular LDPC constructions nearly 
saturate Shannon’s bound for infinite message size [5]. 

LDPC codes are generally based on the introduction of random sparse matri- 
ces for generating the transmitted codeword as well as for decoding the received 
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corrupted codeword. Two main types of matrices have been studied: regular con- 
structions, where the number of non-zero row/column elements in these matrices 
remains fixed; and irregular constructions where it can vary from row to row or 
column to column. Various decoding methods have been successfully employed; 
we will mainly refer here to the leading decoding techniques based on Belief 
Propagation (BP) [6]. 

Most analyses of LDPC codes have been obtained via methods of information 
theory, backed up by numerical simulations. These rely on deriving upper and 
lower bounds for the performance of codes, with or without making assumptions 
about the code used. These bounds represent a worst case analysis, and may be 
tight or loose depending on the accuracy and restrictiveness of the assumptions 
used, and the specific difference between the worst and typical cases. 

The statistical physics based analysis takes a different approach, analysing 
directly the typical case, making use of explicit assumptions about the code used 
and its macroscopic characteristics. Moreover, using methods adopted from sta- 
tistical physics of Ising spin systems, one can actually carry out averages over 
ensembles of codes with the same macroscopic properties to obtain exact per- 
formance estimates in the limit of infinitely large systems. Two methods have 
been used in particular, the replica method and the Bethe approximations [7], 
that is also linked to the Thouless-Anderson-Palmer (TAP) approach [8] to di- 
luted systems. In this paper we will review recent studies of LDPC codes, using 
a statistical physics based analysis. We focus on two specific codes, Gallager’s 
original LDPC code [4] and the MN code [3] where messages are represented 
by binary vectors and are communicated through a Binary Symmetric Channel 
(BSC) where uncorrelated bit flips appear with probability p. 

A Gallager code is defined by a binary matrix A = [A | B], concatenating 
two very sparse matrices known to both sender and receiver, with B (of dimen- 
sionality {M — N) X {M — N)) being invertible - the matrix A is of dimensionality 
{M - N) X N. 

Encoding refers to the production of a M dimensional binary codeword t G 
{0, 1}^ (M > N) from the original message ^ G {0, 1}^ by t = (mod 2), 
where all operations are performed in the field {0, 1} and are modulo 2. The 
generator matrix is G = [/ | B~^A\ (mod 2), where I is the N x N identity 
matrix, implying that AG^ = 0 (mod 2) and that the first N bits of t are set to 
the message In regular Gallager codes the number of non-zero elements in each 
row of A is chosen to be exactly K. The number of elements per column is then 
G = (1 — R)K, where the code rate is i? = N/M (for unbiased messages). The 
encoded vector t is then corrupted by noise represented by the vector ^ G {0, 1}^ 
with components independently drawn with probability P(C) = (1 ~ p)^iO + 
pS{( — 1). The received vector takes the form r = G^^ -I- C (mod 2). 

Decoding is carried out by multiplying the received message by the matrix A 
to produce the syndrome vector z = Ar — AC (mod 2) from which an estimate 
T for the noise vector can be produced. An estimate for the original message is 
then obtained as the first N bits of r-f-r (mod 2). The Bayes optimal estimator 
(also known as marginal posterior maximiser, MPM) for the noise is defined as 
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Tj = argmax^^P(rj | z), where Tj G {0,1}. The performance of this estimator 

can be measured by the probability of bit error = 1 — 1/M Cj]> 

where (5[; ] is Kronecker’s delta. Knowing the matrices B and A, the syndrome 
vector 2 ; and the noise level p, it is possible to apply Bayes’ theorem and compute 
the posterior probability 

P(r I 2 :) = ^X[ 2 ; = ^r(mod 2)]P(x), (1) 

where x[^] is an indicator function providing 1 if X is true and 0 otherwise. 
To compute the MPM one has to compute the marginal posterior P{Tj \ z) = 

I ^)> which in general requires 0{2^) operations, thus becoming 
impractical for long messages. To solve this problem one can use the sparseness 
of A to design algorithms that require 0{M) operations to perform the same 
task. One of these methods is the probability propagation algorithm, also known 
as belief propagation (BP) [6]. 

The MN code has a similar structure, except for the fact that the generator 
matrix is G = B~^A. The randomly-selected sparse matrices A and B are of 
dimensionality M x N and M x M respectively; these are characterized by K 
and L non-zero unit elements per row and C and L per column respectively. 
Correspondingly, the code rate becomes R = N/M = K/C. Decoding is carried 
out by taking the product of the matrix B and the received message 2 ; = 

(mod 2). The equation 

z = A^ + BC = AS + Bt (mod 2), (2) 

is solved via the iterative methods of BP [3] to obtain the most probable Boolean 
vectors S and r; the posterior probability (1) becomes slightly more elaborate, 
including two sets of free variables S and r and two priors. 

2 Statistical Physics 

To facilitate the statistical physics analysis we replace the {0, 1} representa- 
tion by the conventional Ising spin {1,-1} representation, and mod 2 sums 
by products [9]. For instance, in Gallager’s code, the syndrome vector acquires 
the form of a multi-spin coupling = rijg£(/i) 0 where j = and 

/i = 1, • • • , {M — N). The K indices of nonzero elements in the row ^ of a matrix 
A, that is not necessarily a concatenation of two matrices (therefore defining 
a non- structured Gallager code), are given by G(/i) = |ji, • • • , and in a 
column I are the C indices given by A4{1) = {pi, • • • , pc}- 

The posterior (1) can be written as the Gibbs distribution [10]: 

P{t \J) = ]^ lim exp [-(dUpir-, J)] (3) 

M-N / \ F ^ 

■H/3(t; J) = - i n 1 ’ 

where "H the Hamiltonian of the system. 
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The quantity that one concentrates on, in the statistical physics based analy- 
sis, is the free energy which is linked to the probability of finding the system in a 
specific configuration. In the thermodynamic limit of infinite system size, which 
is the main case considered in this work, the state of the system is dominated by 
configurations with the lowest free energy; finite systems are more likely to be 
found in configurations with lower free energy, but may also be found in other 
configurations with some probability. 

To investigate the typical properties of a model, we calculate the partition 
function Z{A,J) = Tr{- 7 -} exp[— /JH] and the free energy f\n[Z{A, J)]) ^ by 
averaging over the randomness induced by the specific code matrix A and the 
true noise vector For carrying out these averages we use the replica method [10] 
or the Bethe approximation [11]; both methods provide the same results. 

The replica method makes use of the identity (In Z) = (lim„_>o 1/n ]2^” — 1]), 
by calculating averages over a product of partition function replica. Employing 
assumptions about replica symmetries and analytically continuing the variable 
n to zero, one obtains solutions which enable one to determine the state of the 
system. The Bethe approximation is based on a consistent solution to a tree 
based expansion for calculating the free energy. Details of the techniques used 
and of the calculations themselves can be obtained in [7] and in the corresponding 
papers [10] and [11]. 



3 Results 

Once the free energy for the possible solutions is calculated, one can identify 
the stable dominant solutions and their overlap m with the true noise/signal 
vectors. In the case of Gallager’s code we monitor m = 1/M Ct]> 

where r is the noise vector MPM estimate. In the case of MN we calculate 
m = 1/N 5i]> estimating the signal vector S. 

One observes three types of solutions: perfect retrieval (ferromagnetic solu- 
tion) TO = 1; catastrophic failure (paramagnetic solution) to = 0; and partial 
failure (sub-optimal ferromagnetic solution) 0 < to < 1. 

In each case one identifies two main critical noise levels: the spinodal point Ps, 
the noise level below which only perfect (ferromagnetic) solutions exist; and pt, 
the noise level above which the ferromagnetic solution is no longer dominant. The 
former marks the practical decoding limit, as current practical decoding methods 
fail above Ps, while the latter marks the theoretical limits of the system. 

The results obtained for i? = 1/4 Gallager code are shown in Fig. la, where 
we present the theoretical mean overlap between the actual noise vector and 
the estimate r as a function of the noise level p, as well as results obtained using 
BP decoding. In Fig. lb we show the thermodynamic transition for K = Q and 
R = 1/2 compare with the theoretical upper bound. Shannon’s bound and the 
theoretical Ps values. 

Results obtained for MN code with various K, L values are presented in Fig. 2. 
On the left - a schematic description of the free energy surface for various K 
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Fig. 1. (a) Mean normalized overlap between the actual noise vector ^ and decoded 
noise t for K = A and C = 3 (therefore R = 1/4). Theoretical values (squares), exper- 
imental averages over 20 runs for code word lengths M = 5000 (•) and M = 100 (full 
line), (b) Transitions for K = 6. Shannon’s bound (dashed line), information theory 
based upper bound (full line) and thermodynamic transition obtained numerically (o). 
Theoretical (diamond) and experimental (-|-, M = 5000 averaged over 20 runs) BP 
decoding transitions are also shown. In both figures, symbols are chosen larger than 
the error bars. 



values; on the right a description of the existing solutions for each noise value p 
and their corresponding overlap m. 

For unbiased messages with K>3 and L>1. we obtain both the ferromag- 
netic and paramagnetic solutions either by applying the TAP approach or by 
solving the saddle point equations numerically. The former was carried out at 
the values of TV and Fg = 0) which correspond to the true noise and input bias 
levels (for unbiased messages F/=0) and thus to Nishimori’s condition [12]. The 
latter is equivalent to having the correct prior within the Bayesian framework [9]. 

The most interesting quantity to examine is the maximal code rate, for a 
given corruption process, for which messages can be perfectly retrieved. This 
is defined in the case of AT > 3 by the value of i? = KjC = N/M for which 
the free energy of the ferromagnetic solution becomes smaller than that of the 
paramagnetic solution, constituting a first order phase transition. The critical 
code rate obtained = 1 — i? 2 (p) = 1 + (p log 2 p+{l — p) log 2 ( 1 — p) ) , coincides 

with Shannon’s capacity. 

The MN code for A" > 3 seems to offer optimal performance. However, the 
main drawback is rooted in the co-existence of the stable m = 1,0 solutions, 
which implies that from most initial conditions the system will converge to the 
undesired paramagnetic solution. Studying the ferromagnetic solution numeri- 
cally shows a highly limited basin of attraction, which becomes smaller as K 
and L increase, while the paramagnetic solution at m = 0 always enjoys a wide 
basin of attraction. 
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Fig. 2. Left hand figures show a schematic representation of the free energy land- 
scape while figures on the right show the ferromagnetic, sub-optimal ferromagnetic 
and paramagnetic solutions as functions of the noise rate p; thick and thin lines denote 
stable solutions of lower and higher free energies respectively, dashed lines correspond 
to unstable solutions. In all cases considered L > 1. {a) K > 3; the solid line in the 
horizontal axis represents the phase where the ferromagnetic solution (F, m = 1) is 
thermodynamically dominant, while the paramagnetic solution (P, m = 0) becomes 
dominant for the other phase (dashed line). The critical noise Pc denotes Shannon’s 
channel capacity, (b) K — 2\ the ferromagnetic solution and its mirror image are the 
only minima of the free energy over a relatively small noise level (the solid line in the 
horizontal). The critical point, due to dynamical considerations, is the spinodal point 
Pa where sub-optimal ferromagnetic solutions (F’, m < f) emerge. The thermodynamic 
transition point ps, at which the ferromagnetic solution loses its dominance, is below 
the maximum noise level given by the channel capacity, which implies that these codes 
do not saturate Shannon’s bound even if optimally decoded, (c) K — 1; the solid line 
in the horizontal axis represents the range of noise levels where the ferromagnetic state 
(F) is the only minimum of the free energy. The sub-optimal ferromagnetic state (F’) 
appears in the region represented by the dashed line. The spinodal point pa, where F’ 
solution first appears, provides the highest noise value in which convergence to the fer- 
romagnetic solution is guaranteed. For higher noise levels, the system becomes bistable 
and an additional unstable solution for the saddle point equations necessarily appears. 
A thermodynamical transition occurs at the noise level pi where the state F’ becomes 
dominant. 
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Studying the case oi K = 2 and L > 1, indicates the existence of paramag- 
netic, ferromagnetic and sub-optimal ferromagnetic solutions depicted in Fig. 2b. 
For corruption probabilities p>Ps one obtains either a dominant paramagnetic 
solution or a mixture of ferromagnetic (m = ±l) and paramagnetic (m = 0) so- 
lutions. Reliable decoding may only be obtained for p<pg, which corresponds 
to a spinodal point, where a unique ferromagnetic solution emerges at m = 1 
(plus a mirror solution at m = — 1). Initial conditions for BP decoding can be 
chosen randomly, with a slight bias in the initial magnetization. The results 
obtained point to the existence of a unique pair of global solutions to which 
the system converges (below ps) from all initial conditions. Similarly, the case 
of iC = 1, L > 1 presented in Fig. 2c shows a dominant ferromagnetic solution 
below ps and the emergence of a sub-optimal ferromagnetic solution above it, 
that becomes dominant at pi . 

The main differences between the results obtained for Gallager and MN codes 
in the case of unbiased messages are as follows. While Gallager’s code allows for 
sub-optimal practical decoding for any K value, it saturates Shannon’s bound 
only in the limit of RT — >■ oo. On the other hand, MN codes can theoretically 
saturate Shannon’s limit for constructions with K > 3, which are of no practical 
value, but they can only achieve suboptimal performance for regular configura- 
tions with K =1,2. 

It should be pointed out that these results are valid only in the case of unbi- 
ased signal vectors A different picture emerges in the case of biased messages; 
this includes the emergence of a spinodal point also in the case of A" > 3 MN 
codes and a decrease in the noise level of the thermodynamic transition to below 
Shannon’s limit. 

It has been shown that irregular LDPG constructions can achieve better prac- 
tical performance (e.g. [5,13]). In analytical studies, based on the same frame- 
work presented here [14] we investigated the position of both critical points Ps 
and pt with respect to Shannon’s limit and their values in regular constructions. 
We show that improved irregular constructions correspond to models with higher 
Ps values while the position of pt changes only slightly. The possibility of em- 
ploying the statistical physics based analysis for providing a principled method 
to optimise the code construction is still an open question. 



4 Related Studies 

We also studied the effect of non-binary alphabet on the performance of LDPG 
codes [15] as it seems to offer improved performance in many cases [16]. The al- 
phabet used in this study is defined over Galois field GF{q) [17]. Our results show 
that Gallager codes of this type saturate Shannon’s limit as C — >■ oo irrespective 
of the value of q. For finite C, these codes exhibits two different behaviours for 
C > 3 and C = 2. For G > 3, we show that the theoretical error correcting 
ability of these codes is monotonically improving as q increases, i.e., the value of 
Pt increases with q for a given configuration. The practical decoding limit, deter- 
mined by the emergence of a suboptimal solution and the value of Ps, decreases 
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with q. On the other hand, C = 2 codes exhibit a continuous transition from 
optimal to sub-optimal solutions at a certain noise level, below which practical 
BP decoding converges to the (unique) optimal solution. This critical noise level 
monotonically increases with q and becomes even higher than that of some codes 
of connectivity C > 3, while the optimal decoding performance is inferior to that 
of C > 3 codes with the same q value. 

The work described so far is limited to the case of infinite message length. In 
finite systems there is some probability of finding the system in a non-dominant 
state, what translates to an error probability which vanishes exponentially with 
the systems size. Significant effort has been dedicated to bounding the reliabil- 
ity exponent in the information theory literature [18]; we have also studied the 
reliability exponent [19] by carrying out direct averages over ensembles of Gal- 
lager codes, characterised by finite and infinite K values. In the limit of infinite 
connectivity our result collapses onto the best general random coding exponents 
reported in the IT literatures, the random coding exponent and the expurgated 
exponent for high and low R values respectively. The method provides one of 
the only tools available for examining codes of finite connectivity, and predicts 
the tightest estimate of the zero error noise level threshold to date for Gallager 
codes. It can be easily extended to investigate other linear codes of a similar 
type and is clearly of high practical significance. 

Finally, insight gained from the analysis led us to suggest the potential use 
of a similar system as a public-key cryptosystem [20] . The cryptosystem is based 
on an MN code where the matrix G and a corruption level p < Pa play the role 
of the public key and the matrices used to generate G play the role of the secret 
key and are known only to the authorised user. 

In the suggested cryptosystem, a plaintext represented by an N dimensional 
Boolean vector ^ G (0, 1)^ is encrypted to the M dimensional Boolean ciphertext 
J using a predetermined Boolean matrix G, of dimensionality M x N, and a 
corrupting M dimensional vector Q, whose elements are 1 with probability p and 
0 otherwise, in the following manner J = G ^ + C : where all operations are 
(mod 2). The corrupting vector ^ is chosen at the transmitting end. The matrix 
G, which is at the heart of the encryption/decryption process is constructed 
by choosing two randomly-selected sparse matrices A {M x N) and B (M x 
M), and a dense matrix D {N x N), defining G = B~^AD (mod 2) . The 
matrices A and B are similar to those used in other MN constructions; the 
dense invertible Boolean matrix D is arbitrary and is added for improving the 
system’s security. Authorised decryption follows a similar procedure to decoding 
corrupted messages in LDPG codes (i.e., using BP), while an unauthorised user 
will find the decryption to be computationally hard [20]. 



5 Conclusions 



We showed how the methods of statistical physics can be employed to investigate 
error-correcting codes and related areas, by studying the typical case character- 
istics of a given system. This approach provides a unique insight by examining 




Statistical Physics of Low Density Parity Check Error Correcting Codes 315 



macroscopic properties of stochastic systems, carrying out explicit averages over 
ensembles of codes that share the same macroscopic properties. 

The results obtained shed light on the properties that limit the theoretical 
and practical performance of parity check codes, explain the differences between 
Gallager and MN constructions, explores the role of irregularity, finite size effects 
and non-binary alphabets in LDPC constructions. 

We believe that methods developed over the years in the statistical physics 
community can make a significant contribution also in other areas of information 
theory. Research in some of these areas, such as CDMA and image restoration 
is currently underway. 
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Abstract. In 1999, Gong and Harn proposed a new cryptosystem based 
on third-order characteristic sequences over finite fields. This paper gives 
an efficient method to generate instances of this cryptosystem over large 
finite fields. The method first finds a “good” prime p to work with and 
then constructs the sequence to ensure that it has the desired period. 
This method has been implemented in C-|--|- using NTL [7] and so timing 
results are presented. 



1 Introduction 

In 1998 and 1999, Gong and Harn proposed a new public-key cryptosystem 
(PKG) based on third-order characteristic sequences of period + q + \ over 
IFq where g is a power of a prime, published in GhinaGrypt’98 [1] and in the 
IEEE Transactions on Information Theory [2], respectively. The security of the 
PKG is based on the difficulty of solving the discrete logarithm (DL) in 
In 2000, Lenstra and Verheul [3] proposed the XTR public-key cryptosystem at 
the Grypto’2000, which is based on the third-order characteristic sequences with 
period — p+lhy taking q = p^. However, for large values of p where q = p^, 
it seems to be difficult to check whether or not a sequence has the correct period 
q^ + q'^ + 1 in the GH-PKG. In this paper, we give a method for constructing such 
instances which are assured to have the desired properties for the case q = p^. 
The more general case q = p'" will appear in the full paper. 

2 Third-Order Characteristic Sequences and the 
Gong-Harn Cryptosystem 

In this section, we present a review for 3rd-order characteristic sequences over 
finite fields and the GH Diffie-Hellman key agreement protocol. 

2.1 3rd-Order Characteristic Sequences 

Let 

f{x) = — ax^ + bx — l,a,b (1) 

B. Honary (Ed.): Cryptography and Coding 2001, LNCS 2260, pp. 317-328, 2001. 

© Springer- Verlag Berlin Heidelberg 2001 
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be an irreducible polynomial over IF^ and a be a root of f{x) in the extension field 
Fq3. A sequence {si} is said to be a 3rd-order characteristic sequence generated 
by f{x) if the initial state of {s^} is given by 

So = 3, Si = a, and S 2 = — 26 



and 






In this case, the trace representation of {sj} is as follows: 



Sfc = Tr(a'=) = «'= + + o'"?', fc = 0, 1, 2, • • • . 



In the following, we write Sk = Sk{a,b) or Sk{f) to indicate the generating 
polynomial. Let f~^{x) = x^ — 6x^ + ax — 1, which is the reciprocal polynomial 
of f{x). Let {sfc(6, a)} be the characteristic sequence of f~^{x), called the re- 
ciprocal sequence of{sk{a, 6)}fc>o- Then we have s_fc(a, 6) = Sfc(6, a), k = 1,2, ■ ■ ■. 



2.2 The GH Diffie- Heilman Key Agreement Protocol 

Note that in [2], the GH-DH was presented in Fp. However, these results are 
also true in F^ where g is a power of a prime. 

GH-DH Key Agreement Protocol (Gong and Harn, 1999) [2] : 

System parameters: p is a prime number, q = and f{x) = x^ — ax^ -\-bx—l 
which is an irreducible polynomial over F^ with period Q = q^ -\- q -\- 1. 

User Alice chooses e, 0 < e < Q, with gcd{e, Q) = 1 as her private key and 
computes (se, S-e) as her public key. Similarly, user Bob has r, 0 < r < Q, 
with gcd{r,Q) = 1 as his private key and (sr,s_r) as his public key. In the key 
distribution phase, Alice uses Bob’s public key to form a polynomial: 

g{x) = x^ — SrX^ + S-rX — 1 

and then computes the eth terms of a pair of reciprocal char-sequences generated 
by g{x). I.e., Alice computes 

s — 7 .^ and s — s — g). 

Similarly, Bob computes 

’^—e) and S-r(Se,S—e)- 

They share the common secret key as (sg^, S-gr)- 
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Remark 1. The XTR [3] uses the characteristic sequences generated by an irre- 
ducible polynomial of the form f{x) = — ax^ + a^x — 1 over lFp 2 with period 

p'^ —p+1. Thus, XTR uses one characteristic sequence instead of a pair of recip- 
rocal characteristic sequences, since in this case S-k = s^. Hence the two terms 
Sk and S-k are dependent. For q = p^, the two schemes have the same efficiency 
when they are applied to the DH key agreement protocol, because the GH-DH 
computes a pair of elements over lFp 2 and shares a pair of elements over lFp 2 , 
while the XTR-DH computes one element over Fp 2 and shares one element over 

Fp2. 

3 The Approach 

One approach to generating sequences of a given period is to randomly select 
fields and sequence polynomials and check to see if it has the desired period. 
However, determining the order of a sequence may be very difficult, and it may 
require many attempts before a good sequence is found. We give a more system- 
atic approach to generating instances based on the following proposition (see [5] 
for a proof). 



Proposition 1. Let f be an irreducible polynomial of degree 3 over GF{q) and 
t be a positive integer. The following are equivalent: 

1. The sequence generated by f has period t. 

2. t is the smallest integer such that f divides a;* — 1. 

3. A root a of f has order t in the extension field GF(q^) of GF{q) generated 

by /• 

Thus, if we can find an element of order q^ + q + 1 in the extension field 
GF{q^), we can construct a polynomial whose sequence is of the desired period. 
Note that the multiplicative group of GF(q^) is cyclic. The following lemma 
helps us determine whether an element in a cyclic group has a given order. 



Lemma 1. Let G be a cyclic group of order n and let t be a number dividing n 
whose prime factorization is t = • • •p®’". Then an element g € G has order 

exactly t if and only if g* = Iq and 1 q for all i = 1, . . . ,r where 1g 

denotes the identity element of G. 

Thus, if we knew the factorization of q^ + q + 1, we could easily determine 
whether or not an element has this order. The problem we immediately encounter 
is that the factorization of large numbers is a very difficult problem. One way 
to circumvent this problem is to choose primes p for which this factorization is 
much simpler. 




320 K.J. Giuliani and G. Gong 



4 Choosing “Good” Primes p 

We define a prime p to be good if the factorization of + g + 1 is known where 
q = p^. We have the following factorization 

q'^ + q+l=p‘^+p‘^ + l = + p + 1) - p + 1) 

Let p^ = p^ + p + 1 and p~ = p^ — p + 1. Let us now derive conditions for when 
a prime I divides either p~^ or p~ . 

First, observe that neither 2 nor p are divisors since q^ + q + 1 is odd and 
congruent to 1 modulo p. Next, note that no prime I divides both and p~ , 
for then it would also have to divide p^ — p~ = 2p. 

Let us now consider the case I = 3. If p = 1 (mod 3), then p+ = 0 (mod 3). 
On the other hand, if p = 2 (mod 3), then p“ = 0 (mod 3). Thus, q“^ + q+1 
will always be divisible by 3. 

For all other primes /, we see that I will divide p'*" (respectively p~) if and 
only if p is a root of the polynomial x'^ + x + 1 (respectively x'^ — x + 1) in 
F/. Observe that any root of + a: + 1 (respectively x“^ — x + 1) has order 
3 (respectively order 6) in the multiplicative group F*. This will occur only if 
I = 1 (mod 3). The converse to this assertion, namely that if / = 1 (mod 3) 
then x"^ + X + 1 (respectively x"^ — x + 1) is reducible, is easily seen. Thus, the 
condition I = 1 (mod 3) is necessary for I to divide p^ or p“, and so we only 
need to consider small primes I of this type. 

If ^ = 1 (mod 3), we can determine the roots of these polynomials over F/ 
using the quadratic formula. The roots of + a; + 1 are 2“^ (— 1 ± V— 3), and the 
roots of — X + 1 are 2“^(1 ± V— 3), where the inverse is taken modulo 1. Let 
r = 2“^(— 1 + -\/— 3) be one of the roots of x^ + x + 1. Then, —r — 1 is the other 
root. In addition, we see that r + I and —r are the roots of x^ — x + I. Hence, I 
divides p^ if and only if p = r, — r — 1 (mod 1), while I divides p~ if and only 
if p = r + 1, —r (mod 1). Note that r, —r — 1, r + 1, —r are easily computable 
given I, and independent of the value of p. 

We now have a strategy for finding a good prime p. We select a bound B and 
a random prime p and attempt trial division of p^ and p~ by all primes < B. 
We then perform a primality test on the remaining large factors of p"*" and p~ . 
If both are prime then we have found a desirable p. Otherwise we try again with 
another random p. 

Note that we need not attempt trial division by those primes I for which 
I = 2 (mod 3). Moreover, we can determine if I divides by p^ and p~ by 
simply checking if the much smaller value p matches one of r, —r — 1, r + 1, —r 
modulo 1. Since these numbers are independent of p, they can be precomputed 
once for all primes p to be tested. This enables us to determine divisibility by 
calculating p modulo I instead of working with the much larger numbers p~^ and 
P~- 

If desired, one also may use some more advanced factorization methods on 
p+ and p~ such as Pollard’s (p — 1 (-method [6] or the elliptic curve method [4]. 
Regardless, the goal is to obtain a value p for which the factorizations of p'*' and 
p~ are known. 
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5 Finding Elements of Order + g + 1 

Now that we have a prime p for which the factorization of + < 7 + 1 is known, we 
would like to use this information to find an element of this order. Suppose we 
have the factorization q^ + q+ 1 = p\^ ■ ■ ■p'p'. We know that F*s is a cyclic group 
of order q^ — 1. Let G C F*s be the unique cyclic subgroup of order q“^ + q + 1. 
The map 

V' : F*3 ^ G 
a !->■ 

is a group homomorphism onto G. Thus, if a G F*a is selected at random, then 
(3 = a'^~^ is a random element in G. We know that (3 has order dividing q^+q+3. 
We can now use the lemma to determine if (3 is indeed an element of this order. 

Now, an element (3 G G has order exactly q^+q+1 if and only if it generates G. 
For a randomly chosen element, this happens with probability where 

4> is the Euler (/)-function. Given the factorization above, this works out to 

(j){q^ + q+l) ^ ^ Pi - 1 Pr-l 

q^ + q+l pf Pr’’ Pi Pr 

If, for example we take B = 2^® = 65536, then in the worst case, this probability 
would work out to 

2 6 65520 q+ - 1 

s'? 65521 ' q+ ' 

where q~^ and q~ are the large leftover factors 
one can find such an element f3 with very high 

6 Constructing the Sequence 

Suppose that we have an element f3 G Fp6 of order p^ +p^ + 1. Let 

f{x) = {x — (3){x — (3^ ){x — (3^ ) = x^ — ax^ + bx — 1 

where a = (3 + + (3'^'^ and b = j3~^ + (3~'p^ + f3~P'^ . Then we have the following 

result [5]. 

Lemma 2. / is an irreducible polynomials of degree 3 over F^ 2 . 

Corollary 1. / generates a sequence of period p'^ + p^ + 1 = q"^ + q + 1 over 

Fp2 = Fg . 

Proof. This follows immediately from the preceding lemma 2 and proposition 1. 

□ 



q 

respectively of p"*" and p~ . Thus, 
probability after only a few tries. 
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7 Obtaining a Representation for the Coefficients 

There is one detail remaining. The representation that we currently have for a 
and b are as elements in . We need to have some sort of representation for 
them in Fp 2 . 

7.1 A General Method 

One method to do so is as follows. Suppose g{x) = x'^ + cix + cq is an irreducible 
polynomial over Fp such that the degree 6 polynomial h{x) = x^+CiX^+Cq is also 
irreducible over Fp. Consider the field extensions to Fp 2 and Fp6 given by these 
two respective polynomials. Then we have the injective field homomorphism 

Fp2 — y Fp6 

UiX + Uo I— >■ UiX^ + Uo 

Suppose that we had used h as our representation of Fp6. Then a and b must 
be of the form a = Uix^ + Ug and b = ViX^ + Vq in Fp6. Thus, we have the explicit 
representation of the sequence over Fp 2 as a = uix + uq and b = vix + vq where 
the extension to the field Fp 2 is given by g. 

This method will suffice if one is not constrained with regards to the extension 
polynomial to be used. One can simply search for a polynomial of the form given 
by h which is also irreducible, from which one obtains the irreducible polynomial 
g. In addition, this method will work if one has a specific extension polynomial 
g{x) = x'^ + c\x + Co in mind, as long as h{x) = x® + cix® + cq is also irreducible 
over Fp. Note that is not always true. 



7.2 A Method for a Specific Polynomial Representation 



Suppose that one would like to use a specific polynomial representation given 
by 9{^) = + c\x + cq. If X® + cix® + Co were not irreducible over Fp, then one 

could not use the method given in the previous subsection. This subsection lists 
another method for doing so. 

First, choose an irreducible degree 3 polynomial g{y) = + d 2 y^ + d\y + dg. 

We can extend to the field Fp 2 by using the representation 



Fp2 



lFp[x] 

(gix)) 



We can then consider g as a, polynomial over Fp 2 and extend the field once more. 
Note that since g is irreducible over Fp and its degree is coprime to that of g, it 
is also irreducible over Fp 2 . Thus we have the extension 



Fp6 



JFp" [y] 
iaiy)) 
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Note that the subfield IFp^ corresponds to those elements with no j/-term asso- 
ciated to it. Thus, it would be easy to obtain a and b using the representation 
of g. 

The problem we encounter with this method is that it is somewhat tedious 
to do a double field extension. In fact, certain software packages such as NTL [ 7 ] 
allow single extensions but do not support multiple field extensions. It would be 
much simpler if we could use only a single field extension. To this end, we now 
show a method to interpolate the the polynomials g and g into an irreducible 
degree 6 polynomial h. 

Let 7 and g be roots of g and g2 respectively. Thus, we have that 
g{x) = + c\x + C2 = {x — g){x — Y) 

and 

Kv) =y^ + d2V^ + diy + do = {y- g){y - g^){y - 
We construct the polynomial h to be the minimal polynomial of the element 
C = IV- Since 7 G F^2\Fp and g G Fps\Fp, we see that this polynomial must 
be of degree 6 and is explicitly given as 

h{z) = + esz® -I- 642"* -I- 632;^ -I- e2Z^ + Ciz + cq 

= {z-0{z- e){z - C^){z - c"){z - e\z - c") 

= (z — gg){z — g^g^){z — 777^ ){z — g^g){z — gg^){z — j^g^ ) 

where we have used the relations 7P = 7 and g^^ = g. 

It turns out that we can derive the coefficients of h directly from the coeffi- 
cients of g and g. With some work, it is easily seen that we have the following 
relations 



65 — — CiC?2 

64 = c^di -I- 00^2 ~ 2codi 
63 = ScoCido ~ c^do — coCidid2 
62 = coc^dod2 ~ 2cQ(iod2 + CqC?^ 

Cl = — CpCidodi 
Co = CqC^q 

Elements in F^e in this polynomial representation have the form (replacing 
z with C) 

Uo + MlC + 

= uo + Ui{gg) + U2(a'^g^) + us^g^g^) + Uiig'^g^) + u^^g^g^) 

Using the relations 7^ = — C17 — C2 and g^ = —d2g'^ — dig — do, we can reduce 
to a representation of the form 

Vo + Ci7 -I- V2g + vogg + v^g"^ + gg^ 

Note that the subfield Fp2 consists of those elements with no 77-term. We can 
use this to determine a and b in the representation using g. 
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Example 1. Suppose p = 3 (mod 4). Then g{x) = + 1 is irreducible over Fp. 

Let g{y) = — y — k he irreducible for some k £ Fp. Then h{z) = z® + 2z^ + 

. An element can be represented as 

■Uo + UlC + U 2 C + 

= uo + Ui{jg) + M2(7^?7^) + 

= uo + fc(u5 — 443)7 + ku 4 g + {ui — M3 — ku^)"fg + (u4 — U 2 )g^ + ka^"fg^ 



If this element were in the subfield Fp2 , then the coefficients of the terms with an 
g must be 0. This yields the relations M2 = M4 = M5 = 0 and mi = M3. Thus, such 
an element can be represented as Mq — kuix in the field Fp2 where g{x) = x'^ + 1 
was used as the extension polynomial. 



Example 2. Suppose p = 2 (mod 3). Then g{x) = x'^ + x + I is irreducible 
over Fp. Let g{y) = y^ — y — k he irreducible for some k G Fp. Then h(z) = 
z® + z"* — 2kz^ + z"^ — kz + k"^ . An element can be represented as 

Uo + WiC + M2C^ + ^3C^ + M4(C'^ + 

= Mo + Ml (777) + M2(7^77^) + M3 (7^77^) + M 4 ( 7 '‘t 7 ^) + Ms (7^77^) 

= Mo + k{u3 - Ms) - /cMs7 + (m3 - ms)t7 + (mi - Ms + ku4)jg 
+ (— M2 — kuo)g^ + (m 4 — M2 — fcMs)777^ 

If this element were in the subfield Fp2, then the coefficients of the terms with 
an ?7 must be 0. This yields the relations M4 = 0 and mi = M 3 = Ms = —k~^U 2 - 
Thus, such an element can be represented as mo — ku\x in the field Fp2 where 
g{x) = x^ + X + 1 was used as the extension polynomial. 

8 The Algorithm 



The method is summarized here in algorithmic form. 

Input ( bit size 6, bound B > 3 ) 
Precomputation: 



for each prime I = 1 (mod 3), 3 < I < B 
compute r = 2“^(1 + -\/^) (mod 1) 
store the pair (l,r) in table A 

end for 
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Prime Generation: 
loop 

generate a random prime p of b bits 
calculate and p~ 
set < 7 + = p~^ and q~ = p~ 
if p = 1 (mod 3) 

let s be the highest power of 3 dividing q~^ 
divide g+ by 3^* 
store {I, s) in table B 
else 

let s be the highest power of 3 dividing q~ 
divide q~ by 3® 
store {I, s) in table C 

end if 

for each entry {I, r) in table A 
compute m = p (mod 1) 
if m = r or m = l — r — 1 

let s be the highest power of I dividing q~^ 
divide g"*" by 
store {I, s) in table B 
else if m = r+ loTm = l — r 

let s be the highest power of I dividing q~ 
divide q~ by 
store {I, s) in table C 
end if 
end for 

do primality tests on both q~^ and q~ 
if both q~^ and q~ are prime 
exit the loop 
else 

clear tables B and C 

end if 

return to the top of the loop 

end loop 



Constructing the Sequence: 

select an irreducible polynomial h of degree 6 over IFp using one of the 
methods in section 7 

loop 

choose a random element a G IFp6 
compute j3 = oP 

for f = 3, < 7 +, < 7 “, each prime I in tables B and C 
if /3(p^+p^+i)/* = 1 
return to top of loop 
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end if 
end for 

exit loop 
end loop 

compute a' = P + 
compute b' = 

derive the irreducible degree 2 polynomial g from h using the appropriate 
method from section 7 

derive a and b from a' and b' respectively using the appropriate method 
from section 7 



Output { p, g, a, b ) 

A Implementation and Timing Analysis 

The algorithm listed in the previous section was implemented in C++ on a 
UNIX server using the NTL [7] number theory library. The dominant step of 
the algorithm was, by far, prime generation. As a result, timing analysis was 
performed on this portion of the algorithm with the results listed in table 1 and 
table 2. 

The bounds B = 2^° and B = 2^® were both tested. 10 instances were 
generated for each bound and for each bit size from 160 to 320 by 8 bits. The 
number of primes searched before the candidate was found, the CPU time, and 
the bit sizes of the largest prime factors and q~ respectively of p"*" and p~ 
were recorded. The class of instances with bound B = 2^® were, on average 
generated much faster than those with bound B = 2^®, especially for the larger 
bit sizes. The bit sizes of q~^ and q~ were only marginally larger with the bound 
B = 2i®. 

In addition, attempts were made to generate instances with the bound B = 4. 
This was attempted for bit sizes 160 to 240 by 8. The average CPU time using 
this bound was substantially longer than with the other two bounds. Thus, for 
some bit sizes, fewer than 10 instances were generated. Results for this bound 
are listed in table 3. 

B Example Instances 

This section lists some example instances. 

Example 3. The prime p listed here has 160 bits. 
p = 1276593311082943972800140646397807976959837132709 
p+ = 162969048190171416273573068072518628828072882395530274432447968 
9642812431700906503271994314811391 

p- = (3)5432301606339047209119102269083954294269096079842498525674379 
33899070716802703629106024880181991 
g{x) = + x + 1 
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Table 1. Timing results for prime generation with bound B — 



bits 


searched 


time (sec) 


9 


9” 


160 


139.9 


19 


309.5 


310.2 


168 


529.5 


79.6 


327.5 


326.8 


176 


318.9 


47.6 


342.4 


343 


184 


184.9 


33.9 


358.7 


358.3 


192 


388.2 


90 


375.9 


374.6 


200 


441.5 


90.6 


391.7 


390.9 


208 


432.2 


93.5 


407 


406.1 


216 


395.5 


127.1 


420.3 


426.4 


224 


445 


117.4 


439.7 


441 


232 


295.9 


83.4 


455.7 


457 


240 


418.5 


118.9 


467.8 


472.2 


248 


847.8 


304 


488.1 


489.3 


256 


1098.4 


396.8 


499.7 


503.5 


264 


582.2 


235.2 


516.4 


518.6 


272 


796.4 


350.9 


535.3 


533.5 


280 


1233.2 


579 


550.8 


552.9 


288 


1299.4 


645.6 


567.8 


563.5 


296 


1144.1 


590.1 


584.5 


581.1 


304 


1320.8 


781.4 


595 


596.8 


312 


467.6 


288.8 


615.4 


614.2 


320 


885.3 


585.3 


634.1 


634.2 



Table 2. Timing results for prime generation with bound B — 2^®. 



bits 


^ searched 


time (sec) 


9+ 


9” 


160 


106.3 


19.8 


307.6 


306.3 


168 


108.2 


21.8 


322.2 


328.1 


176 


115.4 


23.4 


335.6 


334.5 


184 


78.9 


18.9 


354.6 


353 


192 


221.4 


57.9 


375.2 


371.7 


200 


162.2 


43 


388.4 


387.7 


208 


236.8 


66.7 


397.7 


398.4 


216 


240 


76.8 


421.1 


399.2 


224 


158.9 


53.6 


437.9 


427.2 


232 


224 


78.3 


444 


448.6 


240 


106.5 


41.1 


460.8 


468.3 


248 


433 


184.2 


481.6 


479.1 


256 


327.4 


141.8 


493.7 


498.9 


264 


160.7 


75.3 


510.7 


511.9 


272 


331.8 


186.5 


533 


530 


280 


300.9 


162.4 


544 


547.9 


288 


405 


230.7 


560.8 


560.2 


296 


803 


476.9 


584.2 


575 


304 


604 


398.7 


592.5 


592.6 


312 


291.8 


202.9 


611.1 


608.2 


320 


319.4 


264.8 


624.3 


625.7 
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Table 3. Timing results for prime generation with bound B = A. 



bits 


ff searched 


time (sec) 


q. 


q~ 


ff instances 


160 


3721.5 


383.5 


319.5 


319.5 


2 


168 


11727.29 


1383.29 


335 


334.29 


7 


176 


17520.5 


2114.5 


350 


352 


2 


184 


8214.8 


1215.6 


366.6 


367.3 


10 


192 


23773 


3751 


384 


383 


1 


200 


17927 


2918.5 


400 


398 


2 


208 


13146 


2203 


413 


415 


1 


216 


21012 


4238 


431 


430 


1 


224 


6769.5 


1412 


447 


446.5 


2 


236 


10056 


2254 


462.5 


463 


2 


240 


30651 


6889 


477 


479 


1 



a = [85222366791300364439001551384914707254973562335]a; 

+ [1209115664825072234387309339396575750110393685169] 
b = [246868582389120965340698690747362673995248240017]a; 

+ [893048047568985860793458252220232059855756667683] 

Example 4- The prime p listed here has 160 bits. 
p = 1353081569040243787002953026589849378107407355807 
p+ = (3)61027657749213600464084484669844109431755772695720700362696426 
1429092268344297821850519634659019 

p~ = 18308297324764080139225345400953232829526731808689148477428122967 
13270898979713766795344089265443 
g{x) = + 1 

a = [965913929992835996699498327367567768167816904081]® 

+ [114439484991500531161708001866868463982927203984] 
b = [499746686903428250077058835004207585322076077588]® 

+ [241440714372014101653045391345358648608519166355] 
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Abstract. We introduce a hypothetical situation in which low-exponent 
RSA is used to encrypt IP packets, TCP segments, or TCP segments 
carried in IP packets. In this scenario, we explore how the Copper- 
smith/Howgrave-Graham method can be used, in conjunction with the 
TCP and IP protocols, to decrypt specific packets when they get re- 
transmitted (due to a denial-of-service attack on the receiver’s side). 
We draw conclusions on the applicability of the Coppersmith/Howgrave- 
Graham method, its interaction with “guessing”, and the difficulties of 
building a secure system by combining well-known building blocks. 



1 Introduction 

We consider a scenario in which there are many, internally secure, TCP/IP net- 
works. These communicate across an insecure internet. To enable secure com- 
munication, the firewalls for each secure network take the IP sec packet from the 
secure network, encrypt it with RSA [2] and treat the encrypted packet as data 
for an IPinsec packet directed at the firewall of the destination network. This then 
decrypts the packet and injects it into its secure network, as shown in figure 1. 

From the point of view of the IPsec layers, the firewalls and the IPinsec com- 
munication all form a (complex) link layer over which the IPsec packet travels. 

The attacker is assumed to listen to the IPinsec packets transmitted across 
the insecure internet from the sender (A) to the receiver (B). By flooding B, or a 
switch near B, with other traffic, the attacker can (at least with high probability) 
cause B to miss a transmission from A. IP is inherently an unreliable protocol, 
so the higher levels above IP (e.g. TCP) will have mechanisms to re-transmit 
the lost message. What information can the attacker gain in this scenario? 

The Coppersmith/Howgrave-Graham method [4,7,8,11] is encapsulated in 
the following theorem. 

Theorem 1. Let P he a monic polynomial of degree S in one variable modulo 
an integer N (of unknown factorisation). Then one can find, in time polynomial 
in {log N,S), all integers xq such that P{x(f) = 0 (mod N) and [xqI < 

The method forms an hS x hS lattice, where h is the control parameter. To reach 
one needs h to be arbitrarily large, and the actual bound on xq achieved 
varies with h, as shown in table 1 for the case of <5 = 3. 

* The authors are grateful to Dr. D. Coppersmith, Dr. N.A. Howgrave-Graham and 
Mr. A.J. Holt for their contributions to this work. 
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Application 
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Fig. 1. Protocol layers in the scenario 



Table 1. xo as a function of h; 5 — 3 



h = 2h = 3h = 4:h = 5 h = 6 h = 7 . 

^^0.2 ^^0.25 ^yO.27 ^^0.286 jyO.294 ^^0.3 



ft = 67 
jV°-33 



In this paper, we are concerned with recovering bit-fields from IP packets, 
generally 16-bit fields. Table 1 shows how many bits we can expect to recover 
for various values of h in various scenarios for the size of the RSA modulus (512, 
1024 or 2048 bits), choices of the RSA exponent, and the presence or absence of 
checksum wrapping (as explained later). 



Table 2. Values for (f> for recovering xo < 2'^ 





e = 3 


e = 5 1 




No CS Wrap 


CS Wrap 


No CS Wrap 


CS Wrap 1 




512 


1024 


2048 


512 


1024 


2048 


512 


1024 


2048 


512 


1024 


2048 


h = 2 


34.13 


68.27 


136.53 


30.12 


60.24 


120.47 


11.38 


22.76 


45.51 


10.44 


20.89 


41.8 


h = 3 


42.66 


85.33 


170.67 


39.38 


78.77 


157.54 


14.63 


29.26 


58.51 


13.84 


27.68 


55.35 


h = A 


46.54 


93.09 


186.18 


43.89 


87.78 


175.54 


16.17 


32.34 


64.67 


15.52 


31.03 


62.06 


h = 5 


48.76 


97.52 


195.04 


46.55 


93.09 


186.18 


17.07 


34.14 


68.26 


16.52 


33.03 


66.06 


h = 6 


50.19 


100.39 


200.78 


48.3 


96.60 


193.20 


17.66 


35.31 


70.62 


17.18 


34.36 


68.72 
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The complexity of the lattice reduction, which is the dominant step in the 
Coppersmith/Howgrave-Graham method, is 

0{h^S%logNf), (1) 

assuming classical arithmetic. From this and table 1, we see the importance of 
minimising both h and i5. 

Most of the computation for this paper was done in Maple but Victor Shoup’s 
NTL[12] library for C++ was used for the time-critical lattice reduction part. 

2 IP Packets 

Figure 2 shows a diagram of an IP packet. To understand the attacks on IP 
datagrams, we need to understand the IP datagram header. A good general 
reference on TCP and IP is [13]. 



0 




15 16 




31 


veesaan 


4-blthMdcr 

length 


8-b« type of lervioe 

nos) 




16-bit totallengdt (in b 3 ftea) 




I6rbit kientiAcition 


flap 


13-blt fragment o0Mt 


8-blt (bne 10 Uv« 
OTU 


8^)it protocol 


1 6-bit header diecksum 



32-blt fouR9( IP «ddn 



32+it dniination IP addics 



ofitians (1/ any) 



20 bytes 



data 



Fig. 2. IP datagram, showing the fields in the IP header 



In the denial-of-service scenario mentioned in the introduction, the higher 
protocols above IP would cause a re-transmission. The IP layer is unaware that 
this is a re-transmission, and sends this as a new packet. What fields change 
between this re-transmission and the original transmission? The only two fields 
that we expect to change are the following. 

— The 16-bit identification field. This normally increases by 1 every time the 
IP layer sends a packet. 

— The one’s complement sum of the header, stored in the 16-bit checksum 
field. Every time the 16-bit identification field is incremented, the checksum 
is decremented. Care must be taken if the checksum exceeds 65535, as it will 
then wrap around and restart from 1. 
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3 Attacks on IP Packets 

First we need to take the denial of service attack and look at it in more detail. 
In the denial of service attack letting 



Cl = 



C2 = (m + xy 

gives us a basic idea of how an attack could be implemented. However we are 
more specifically interested in what two IP packets would look like when rep- 
resented as Cl and C 2 , and the IPsec identification fields differ by a. Using the 
knowledge about IP packets it is possible to construct a similar general formula, 
which is shown and explained below. 



Cl = 



1 3 

C 2 = {m+ (2‘‘®a - a ±65535) x 2®^+'')® 

2 

1 . This is because the id field is 48 bits from the checksum field and increases by 
1 each time a packet is sent. The —a is because when the id field increments 
by 1, the checksum decrements by 1. 

2. The ±65535 only applies if the IPsec checksum wraps around, something 
which cannot be determined from the encrypted packets. If the checksum 
does not wrap around between the two packets then this value is 0. 

3. This 2®'*+'^ is derived from the checksum being 64-bits (32 bit source address 
and 32 bit destination address) from the end of the packet +d which is the 
number of bits of data which are appended to the end of the packet. 

Next two RSA encrypted versions of the same (apart from identification and 
checksum) IPsec packets are required. Define these IPinsec data fields to be eipi 
and eip 2 - These would correspond to the IPsec packets ipi and ip 2 , and would 
have been encrypted by taking eipj = ip^ mod N. 

Letting ci and C 2 be the symbolic forms of the two encrypted packets eipi and 
eip 2 , it is possible to calculate resultants. This is done by taking the resultants 
of Cl — eipi and C 2 — eip 2 , i.e. of m® — eipi and (m ± x)® — eip 2 , with respect to 
m. 

Taking resultants will give a polynomial in a of degree (though it may be 
possible to reduce this to e) . This is the polynomial satisfied by the unknown, but 
comparatively small, a. We use the Coppersmith/Howgrave-Graham method [4, 
7,8] to find a, by solving the appropriate lattice. 

Once Of is known, we take the greatest common divisor with respect to 2 : of 



gcd( 2 ® — eipi, (z ± a X (2^® — 1) x 2®"^'*''^)® — eip 2 ) mod N) = z + ipi + XN 
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where z is an indeterminate, and A represents the fact that we are interested in 
ipi (mod N) (in practice A = 1). This gives 

z — (gcd(z® — eipx, (z + (2^®o; — a) x — eip 2 ) mod N) — XN = ipi 

Therefore we have recovered ipi and broken the RSA encryption on these 
particular IP packets. 

Symbolically, in the absence of checksum wrapping, and ignoring for simplic- 
ity the large numerical factors, the equation for a is of the following form. 

a® -I- 3(ci — C 2 )a® -I- + 7c\C2 + (^)a^ + (ci — 02 )^ = 0 (mod N) 

While this is of degree 9 in a, which would imply reducing an 18 x 18 lattice 
with ft, = 2, it is in fact a polynomial of degree 3 in a®, and solving it as such 
only requires reducing a 6 x 6 lattice, giving a 3® = 729 theoretical saving^. 
Unfortunately, if the checksum does wrap, this simplification is not possible. 

Shown below is a summary of the timings in seconds to perform the lattice 
reduction phase of this attack for various sizes of the public- key modulus.. 



Table 3. Times for IP reduction 



NTL Timings in seconds to lattice reduce 
RedHat Linux 6.2 on IGhz Pentium III with 500Mb RAM 


Public exponent 




e=3 


e=5 


RSA-type 




512 


1024 


2048 


512 


1024 


2048 


h 


(control parameter) 


2 


2 


2 


4 


2 


2 


IP 


No checksum wrapping 


2 


9 


27 


8068* 


177 


1386 


With checksum wrapping 


653 


3413 


3976 


t 


793465 


§ 



f Not implemented due to software restrictions. This would in fact have required 
ft = 5. 

§ Not implemented in this report, due to the running time & resource con- 
straints. 

* Taking a < 2^^ allowed ft = 2, with e = 5 this formed a 10x10 matrix which 
reduced in 19 seconds. Therefore guessing all possibilities for the top five bits 
of a 16-bit identification field and using the Coppersmith/Howgrave-Graham 
method to find the bottom 11 bits would take 2® * 19 = 608 seconds: over a 
factor of 10 faster than direct solving. 

Without checksum wrapping, and with ft = 2, the e = 3 examples required a 

6x6 lattice, and e = 5 required 10 x 10. With checksum wrapping and ft = 2, the 

The same happens for e = 5, and in fact we also save on ft, being able to take ft = 4 
rather than ft = 5, as can be seen from table 1. 



1 
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e = 3 examples required a 18 x 18 lattice, and e = 5 required 50 x 50. In general, 
we note that doubling the length of the modulus multiplies the running time by 
about 8, which is what one would expect from the theoretical performance (as 
given in equation (1)) of the LLL algorithm [9]. 



4 TCP/IP Sessions 



TCP (Transmission Control Protocol) is a common protocol to use above IP. It 
provides a reliable connection-oriented service on top of the unreliable service 
provided by IP. In the scenario in Figure 1, if an opening (i.e. carrying the 
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Fig. 3. TCP segment, showing the fields in the TCP header 



SYN flag) TCP packet is lost, the TCP packet will simply be re-transmitted 
unchanged. In this case, the attack in the previous session is the only one possible. 
However, it is possible to deny service for the entire TCP opening sequence, in 
which case protocols above TCP may well start a new TCP session, which will 
have its own sequence number. 

In a denial of service attack we are interested in what would change in a 
TCP packet header, shown in figure 3. This is slightly more complex because 
the TCP checksum is calculated (as in the case of the IP checksum) as a one’s 
complement sum of 16-bit words. In the case of TCP, the field that changes is 
the sequence number. Unfortunately this is a 32-bit field. Therefore although we 
know that the sequence number in many implementations (incorrectly: see [1,3, 
10]) simply increments by 64000 every half second, the effect on the checksum 
is rather different. 
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If the 32-bit sequence number increments by 64000/3, the high 16-bits in the 
32-sequence number normally^ increase by (3 and the low 16-bits decrease by 
1536/3 so the overall change in the checksum (as it is the one’s complement sum) 
is -1-1535/3. This is valid only if /3 < 2^® (otherwise the 32-bit nature of the field 
manifests itself). We assume that [3 2^®. 

5 Attacks on TCP/IP Sessions 

We consider two kinds of attacks: those where we assume the TCP packet alone 
is encrypted (i.e. the encryption takes place between TCP and IP, rather than 
below IP sec as in figure 1) and those where the complete IP sec packet is encrypted, 
in accordance with figure 1. These are referred to as TCP and TCP/IP in table 
5. The TCP attack is similar to the IP attack: we are solving for one variable, 
/3, and the equation to be solved is of degree e^, which can be reduced to degree 
e in the event of no checksum wrapping. 



Table 4. Times for TCP reduction 



NTL Timings in seconds to lattice reduce 
RedHat Linux 6.2 on IGhz Pentium III with 500Mb RAM 


Public exponent 




e=3 


e=5 1 


RSA-type 




512 


1024 


2048 


512 


1024 


2048 


h 


(control parameter) 


2 


2 


2 


4 


2 


2 


TCP 


No checksum wrapping 


1 


9 


27 


9456 


167 


1384 


With checksum wrapping 


702 


4631 


5129 


t 


§ 


§ 


TCP/IP 


No checksum wrapping 


P 


§ 


§ 


§ 


§ 


§ 



f Not implemented due to software restrictions. This would in fact have required 
h = 5. 

'P Computation aborted after one month. 

§ Not implemented due to the running time & resource constraints. 

The TCP/IP attack is more complicated, as we are solving for two variables, 
a (the change in IP identification field) and /3. From a polynomial point of view, 
this forms a bivariate modular equation. As pointed out by Coppersmith [4] 
and Howgrave-Graham [7,8], there is a heuristic extension of their method to 
bivariates. Let us assume that the IPsec packets are tcpipi and tcpip 2 , with one 
byte of TCP data. Taking etcpipi and etcpip 2 to be tcpipi and tcpip 2 encrypted 
with RSA, resultants were taken using them and ci and C 2 shown below, which 
formed a bivariate modular polynomial. 

^ That is to say, when the low-order part of the sequence number is greater than 
1536/3, which one might expect to occur 1 — ||||f of the time. 
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Cl = 



A 

C 2 = (m + ((2^® - l)a 2^32) + (( 28 ° x (64000/3) + (1535/3)) x 

^ ^ ^ 

IP packet TCP packet 

" V " 

TCP/IP packet 

A - The IP checksum field is now 232 bits from the end of the packet as it 
is 64 bits from the end of the IP header, 160 bits (20 bytes) of TCP header and 
8 bits of data. 64+160+8=232 

As this is an e = 3 case we have already attacked IP and TCP with e = 3 so 
it is safe to conclude that h = 2. Therefore the polynomial is never raised to a 
greater degree than 1. 

We obtained a polynomial in a and f3 where the maximum degree of /3 is 9. 
This polynomial was made monic with respect to /3®. Then using the formula 
to calculate the dimensions of the matrix where j = 9, showed that a 190x190 
matrix was required. 

dim = 2 j 2 + 3j + 1 = 2 X 9^ + 3 X 9 + 1 = 162 + 27 + 1 = 190 

It is possible to calculate the number of rows which must contain N and the 
number of rows which must contain the coefficients of p{x), 



190 - + y + ^ = 190 - = 135 

2 2 



So 135 rows of the matrix contain N's (scaled by the bounds for a and (3) on the 
diagonal (and zeros elsewhere) and the remaining 55 rows contain the coefficients 
of the polynomial multiplied by the different combinations of the two variables 
a and j3 and scaled by the bounds for a and /3, so that the leading term is on 
the diagonal. 

The lattice reduction is extremely costly in time and resources, and at the 
time of finishing [5], the 190x190 lattice reduction process had been running for 
in excess of 750 hours. 

In theory though, if the LLL reduction had been successful on this large 
matrix, we would have taken the first two short vectors of the LLL reduced 
matrix, divided them both by the numeric vector (of upper bounds), and formed 
two simultaneous equations. These two simultaneous equations would be solved 
to return a and j3. 

Next with a and [3 in hand, and etcpipi and etcpip 2 corresponding to the 
two encrypted messages, we would calculate the linear polynomial 

gcd(z3 - epi , (z + (2^3 - 1 )o2232 + ( 28 ° x 64000 + 1535)/3 x 2 ^ 4)3 - epa) mod N) 



to recover tcpipi. 
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6 TCP/IP Attacks Revisited 

A better way to attack TCP/IP packets is by ‘“guessing” (3. a represents the 
change in the IP identification field, as this increments by I every time a packet 
is sent from the original sender (including packets internal to the sender’s IP sec 
network, which cannot be detected outside), guessing this would be essentially 
impossible due to the number of packets sent. However /3 increments by 1 every 
half second. Remembering that we are performing a denial of service attack, if 
we, for example, sniffed two packets in the space of 4 seconds, there would be 
only eight (3’s to guess. 

After experimentation with e = 3 and substituting /3 = {1, 2, 3, 4, 5, 6, 7, 8} 
it was clear to see from the resultant polynomial that this was a univariate 
polynomial in a with maximum degree 9. Unfortunately this could not be solved 
as a degree 3 polynomial in a^, but running eight 18x18 lattice reductions takes 
significantly less time (8 x 680 = 5440 seconds) than attempting to LLL reduce 
the 190x190 matrix. 

Taking 

c2 := (m + ((2‘‘®a - a) x + ((2®° x (64000/3) + (1535/3)) x 2^^))^ 

But then for example guessing /3 = 1 gives: 

c2 := (m + ((2^®a - a) x 2^^^) + 1329207713375312221233383113029058560)^ 

We observe empirically that lattice reducing a matrix with an unsuc- 
cessful guess of /3 takes no longer than to reduce a successful guess of 

/3. 

A polynomial is constructed and solved using LLL reduction as in the IP 
case with checksum wraparound to resolve a, and now with a calculated and /3 
guessed, 

gcd( — etcpipi, 

{z + (2-^® - 1)02^32 + (2«o X 64000 + 1535)/3 x 2^4)3 - etep/pa) mod N) 



is calculated, which is equal to z — tcpipi — XN, hence RSA encryption on these 
TCP/IP packets is broken. 



7 Conclusions 

Table 1 shows that the Coppersmith/Howgrave-Graham method is eminently 
applicable for the decoding of low-exponent RSA-encrypted IP packets. For e = 5 
we also have the paradoxical result that increasing the key length from 512 
bits to 1024 actually reduces the time by a factor of 45, since a smaller lattice 
(10 X 10 rather than 20 x 20, with entries modulo N rather than N^) is involved. 
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Extrapolating from table 1 shows that the key size would have to rise to 4096 
bits before security is improved. 

Yet again, we note that the common, but flawed, implementation of TCP 
initial sequence numbers is a security loophole [1,3,10]. In our case, the fact that 
64000 « 2^® is an added weakness. 

We also note the power of combining guessing some bits with the Copper- 
smith/Howgrave-Graham method for finding others: see note (*) in IP attacks 
and section 6. 

More generally, we have shown that a cryptosystem built according to stan- 
dard principles of protocol layering with “standard” components displays unex- 
pected, and in some cases computationally trivial, vulnerabilities. 
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Abstract. A recursive construction is provided for sequence sets which 
possess good Hamming Distance and low Peak-to- Average Power Ratio 
(PAR) under any Local Unitary Unimodular Transform. We identify a 
subset of these sequences that map to binary indicators for linear and 
nonlinear Factor Graphs, after application of subspace Walsh-Hadamard 
Transforms. Finally we investigate the quantum PAR; measure of ’Linear 
Entanglement’ (LE) under any Local Unitary Transform, where optimum 
LE implies optimum weight hierarchy of an associated linear code. 



1 Introduction 

Golay Gomplementary sequences of length 2” form sequences with Peak-to- 
Average Power Ratio (PAR) < 2 under the one-dimensional continuous Discrete 
Fourier Transform (DFT“) [9]. The upper PAR bound of 2 follows by form- 
ing these Gomplementary Sequences using Rudin-Shapiro construction [25,26]. 
This set is the union of certain quadratic cosets of Reed-Muller (RM) (l,n) [5]. 
Moreover the quadratic coset representatives can be viewed as ’line graphs’ in 
Algebraic Normal Form (ANF) [21]. As these sequences are a subset of RM(2, n), 
the Hamming Distance, D, between sequences in the set satisfies D > 2"“^. The 
problem of finding error-correcting codes where each codeword also has low PAR 
has application to Orthogonal Frequency Division Multiplexing (OFDM) com- 
munications systems [11]. However the fundamental codeset identified by Davis 
and Jedwab [5] (DJ sequences) suffers from vanishing rate as n increases, and 
much higher rates are possible and desirable, where PAR < 0{n) [27,22]. A gen- 
eralisation of Rudin-Shapiro construction to other starting seeds [16,17]. allows 
inclusion of more low PAR quadratic cosets of RM(l,n) in the code, thereby 
improving code rate somewhat. Higher degree cosets... etc can also be added, 
increasing code rate at price of distance, D, which decreases. However these rate 
improvements are marginal. In this paper we present a construction for much 
larger codesets of sequences with PAR < 2*, comprising ANFs up to degree u, 
where u < t for t > 1, and m = 2 for t = 1 [19]. These codesets have PAR < 2* 
under all Linear Unimodular Unitary Transforms (LUUTs), including one and 
multi-dimensional continuous DFTs. As LUUTs include the Walsh-Hadamard 
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Transform (WHT) then our construction gives large codesets of Almost-Bent 
functions [3,23]. The functions are cryptographically even stronger, as the bi- 
nary sequences are distant from linear sequences over all alphabets, not just 
over Z- 2 .. We then describe a mapping of a subset of the bipolar sequences, gen- 
erated using our construction, to Factor Graphs [12]. By applying tensor prod- 
ucts of Hadamard and Identity kernels to our bipolar sequence we transform to 
a Factor Graph in a Normal Realisation [7] representing a linear or nonlinear 
error-correcting code. This transformation provides spectral characterisation for 
Factor Graphs (and Quantum Factor Graphs [15]). Finally we present PAR;, 
which is a partial measure of quantum entanglement and measures PAR under 
all Linear Unitary Transforms (LUTs) [17,18]. We also define ’Linear Entangle- 
ment’ (LE), and ’Stubborness of Entanglement’ (SE), which is a series of pa- 
rameters related to PAR; over all sequence subspaces. At least in the bipartite 
quadratic case, a length 2" bipolar sequence with optimal LE and SE represents 
a [n, /c,d] binary linear code with optimal weight hierarchy. We conjecture that 
optimally entangled subsystems represent optimal linear and nonlinear codes - 
and vice versa. A similar relationship between secrecy and entanglement has 
recently been highlighted by [4]. 

2 A Construction for Low PAR Error-Correcting Codes 

Joint work with C.Tellambura [19] 

PAR is a spectral measure. We must therefore define the transforms over which 
the spectrum is computed: 

2.1 Definitions 

Definition 1 is the infinite set of length 2” complex linear unimodular se- 
quences, I = {loji , . . ■ , where \k\ = \lj\, \/i,j, J2i=o 

I = {2^ (oo, bo) G (oi, 6i) G . . . G (a„_i, 6„_i)} 



where ® means ’tensor product’. 



Definition 2 A 2” x 2" Linear Unimodular Unitary Transform (LUUT) ma- 
trix L has rows taken from Ln such that LV = Ian, where f means conjugate 
transpose, and la" is the 2” x 2" identity matrix. 



Definition 3 G„ is the infinite set of length 2” complex linear sequences, I = 
{lo, h,..., where J2i=o 1 

I = {2^ (oo, 6o) G (ai, &i) G • • • G (a„_i, 6„_i)} 



Note that Gn A Ln. 
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Definition 4 ^ 2” x 2" Linear Unitary Transform (LUT) matrix G has rows 
taken from Gn such that GGt = Ian. LUUTs are a special case of LUT. 

Let Si be an element of a length 2” vector, s. PAR(s) is computed by measur- 
ing maximum possible correlation of s with any length 2" ’linear’ unimodular 
sequence, I € Ln'. 

Definition 5 PAR{s) = 2^maxi{\s • l\^) 

where I € Ln and ■ means ’inner product’ [17]. 

Let X = {xq^Xi, . . . ,Xn-i}. Then p{x): Zlf — >■ Z 2 has a bipolar representation, 
s = (-l)p(^) = (so,si,...,s 2 n-i), where s* = 
and i = ^ radix -2 decomposition of i. 

2.2 Gonstruction 

This paper focuses on a special case of a more general construction. Here, all 
Xi are two-state binary variables, and the fundamental recursion is based on 
Walsh-Hadamard Transform (WHT) kernels. The more general construction is 
presented in [19]. We now present the construction: 

P{^) ^ "^1=0 + + 

+ X/J=0 ■ ■ ■ y — 

where n = Lt, tt permutes Z„, and where fij : — >■ Z 2 is such that f.y^ = 

ifojy fi,jy • ■ • ) ft-i,j) is an invertible boolean function (permutation polynomial) 
from Z^ — >■ Z 2 , governed by the permutation, i' = jj(i), where i' = X)i=o 
a radix-2 decomposition, i[ = fij{io,ii, ■ . ■ ,it-i), and each 7 ^ permutes Zf. To 
avoid unnecessary duplications, we exclude the f^^ where one or more fij has 
a ’- 1 - 1 ’ constant offset, and also the cases where all fij are monomials, except 
when f.yj is the identity function. 

Theorem 1 [19] The length TV = 2” bipolar sequence s = (—1)^ satisfies 
PAR{s) < 2* under all LUUTs, where p is generated using construction (1). 

Proof, (sketch) Let m factor fully as m = Po Pi necessarily distinct. A 

length m vector, I, is defined linear if it satisfies I = Vi where length(uj) = 

Pi, and = 1- Let Ej and Aj, 1 < j < L, be a series of A^ x A^ and 

N X Nl complex matrices, respectively, where A± = Ei is unitary. Let the 
rows of Aj—i, (ao,j— 1 , . . . , ajv— i,j— 1 ), form a complementary set of N 

sequences under any Nl~^ x Nl~^ unitary transform with linear unimodular 
rows. Let I and Ij be normalised linear rows of length and N, respectively. 

Let r = Aj_il. Let 7 permute Zn- Construct the N x matrix, Aj, such 
that aij = ((aT,(o),j-i|a-y(i),j-i| • • ■ , |a-y(jv-i),j-i)©(ei,jOl)) where xQy = 
(xoyo, xiyi , . . . , Xjsii-iyMi-i), 1 is the length all-ones vector, ejj is the ith 

row of Ej, and 'j' means concatenation. The rows of Aj form a complementary 
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A^-set under any unitary transform if r' = Aj{lj 0 1) satisfies, 

This follows if I Z)fcJo^(’" 7 (fc)G,fc^fc)P = 1, for rk,Ci^k and Ik elements of 

r ,Eij and Ij, respectively. This is true if Ej is unitary, and if ei^j 0 Ij is 
unimodular, which follows if and Ij are unimodular. Construction (1) occurs 
when successive Aj are recursively generated, where all Ei are 2* x 2* WHTs. 
The 7 permutation essentially maps to f-y, and concatenation is widened to a 
more general permutation, tt, over all linear variables. I 



Theorem 2 For a fixed t, let P he the codeset of length 2" binary sequences of 
degree /i or less, generated using (1). Then, 



AH. < 

2"+i — 



< 



2t\ ’ 

((2*-1)!)T-V!{22*-*-1)T 

2t\ 



fi = 2 
fi>2 



(2) 



where F = rii=o(2*~2*) = |GL(t,2)|. (GL is the General Linear Group). (Only 
for t = 1 is the upper hound exact). 



Proof. By counting arguments we can show that, for p, = 2, 



2"+i 



< 




X 







-1 



X (2(^/^))? 



P' • ( 2^ 1 ! • • • 

For p > 2, we replace with which is the number of permutations exclud- 
ing those with a constant offset, ’-1-1’. The Theorem follows. I 

In Section 2.4 we show how to generate all degree-one permutation polynomials, 
via an isomorphism to the General Linear Group, where the number of degree- 
one permutation polynomials is F. 



2.3 Examples 

The 2" X 2" Walsh-Hadamard (WHT) and Negahadamard (NHT) Transform 
matrices are H, and N, respectively, where iT = ( J ) and N = 

( 1 Ai) , and = — 1 . DFT“ is the set of 2” x 2" matrices, the union of whose rows 
form a subset of Ln such that each row satisfies Oi = 1, hi = for some fixed 
k, and w is a complex root of unity (see Definition 1). These three transforms 
are used as ’spot-checks’ in the examples to validate the PAR upper-bound. 



Example 1. Let be the identity permutation Vj. Then, 

(^'?T'(z(j+i)) 5 ^7r(z(i-t-i)-t-i) 5 ■ ■ * 7 ^7r(z(j+2)— 1 ) ) ^I'nd (1) becomes, 

L-2 t-l L-l 

p{x) = EE ^ ^ 9j i.^7r(tj ; ^7r(tj + l) ? * * * ? ) (3) 

j=0 1 = 0 j=0 
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When deg{gj) < 2, Vj, it is well-known that s = (— is Bent (PAR = 1 
under the WHT) for L even [14] and (perhaps not known) that s has PAR = 2* 
under the WHT for L odd. In general, for any gj, s has PAR < 2* under all 
LUUTs. For example, if L = 4 and, 

p{x) = XqXs + X1X4 + X2X5 + X^Xq + X^Xr + X5XS + XQXg + XyXio + X^Xii 

then s = (— l)p(®) has PAR = 1.0 under the WHT, PAR = 1.0 under the 
NHT, and PAR = 7.09 under DFT“. Similarly, let gg{xg,xi,X 2 ) = 0 : 1 X 2 , 
gi(x 3 ,X 4 ,X 5 ) = X 3 X 4 X 5 , and g 2 (xQ,X 7 ,xs) = 0. Then s' = [-l)p(^)+9o+gi+92 
has PAR = 4.0 under the WHT, PAR = 2.0 under the NHT, and PAR = 7.54 
under DFT“. In all cases, PAR < 8.0 under any LUUT. 

Example 2, PAR < 2.0. Let t = 1. Then we have one possible permutation 
polynomial, namely, fj = x, (we exclude = x + 1). From (1) we obtain, 

p(^) ^^j—o T “t“ d, Cj ^ d ^ Z 2 (4) 

This is exactly the DJ set of binary quadratic cosets of RM(l,n), where n = L 
[5]. This set has PAR < 2.0 under DFT“ [5]. Such sequences are Bent for n 
even [14,23] and, in [16,17] it was shown that such a set has PAR = 2.0 under 
the WHT for n odd, and also, under the NHT, has PAR = 1.0 for n yf 2 mod 3 
(NegaBent), and PAR = 2.0 for n = 2 mod 3. More generally the DJ set has 
PAR < 2.0 under any LUUT [17], and this agrees with Theorem 1. For example, 
let p{x) = X 0 X 4 + X 4 X 1 +X 1 X 2 + X 2 X 3 -l-xi -1 1 . Then s = (— 1 )p(®) has PAR = 2.0 
under the WHT, PAR = 2.0 under the NHT, and PAR = 2.0 under DFT“. 
The DJ set, being cosets of R{2,n), forms a codeset with Hamming Distance, 

D >= 2”“ . The rate of the DJ codeset follows — as n increases. This is 
their primary drawback as the code rate vanishes rapidly as n increases. 

Example 3, PAR < 4.0. [5,22,17,23] have all proposed techniques for the 
inclusion of further quadratic cosets, so as to improve rate at the price of in- 
creased PAR. We here propose an improved rate code (although still vanishing), 
where PAR < 4.0. To achieve this we set t = 2 in (1). There are = 3 valid 
permutation polynomials, = (/o, /i). These polynomials map from — >■ Z|, 

and are taken from the set, 

/^(xo,xi) G {(xo,xi), (xo -I- xi,xi), (xo,xo -I- Xi)} 

Substituting for fij and gj in (1) gives a large set of polynomials with PAR< 4.0 
under all LUUTs. We now list, for this construction, the p{x) arising from the 
the 3 invertible polynomial functions, fj, for one ’section’ of the polynomial, i.e. 
for L = 2, where we fix tt to the identity permutation. 

p{x) = X0X2 + X1X3 + cqxqxi + C1X2X3 + RM( 1 , 4 ) 
p{x) = xo(x2 + X3) + X1X3 + cqXqXi + C1X2X3 + RM( 1 , 4 ) 
p(x) = X0X2 + xi(x2 -I- X3) + cqXqXi -I- C1X2X3 -1 RM( 1 , 4 ) 
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where cq, ci G Zi. The quadratic part of each of these 3 functions is isomorphic 
to a distinct invertible boolean t x t matrix, where t = 2 (Section 2.4), as the 
permutation polynomials form a group which is isomorphic to the General Linear 
Group, GL(t, 2), where |GL(t, 2)| = rii=o(2* ~ 2®) [13]. Two of the 3 quadratic 
functions are inequivalent under permutation of the four variable indices, e.g., 

p{x) = XoX 2 + X 1 X 3 + CoXoXl + C 1 X 2 X 3 + RM(1, 4) 
p{x) = xo(x 2 + X 3 ) + X 1 X 3 + cqXqXi + C 1 X 2 X 3 + RM(1, 4) 

An upper bound on jPj is given by Theorem 2, (2). Substituting t = 2 into (2), 



1^1 

2"+i 



<n!2”2'*32 ^ 



( 5 ) 



An exact enumeration and construction for this set remains open, due to extra 
’hidden’ symmetries. Gomputationally we are able to calculate the exact number 
of quadratic coset leaders for n = 4, 6, 8, 10, and these are compared to the upper 
bound of (5) in Table 1. They are also compared to the number of quadratic 
coset leaders, (= in the binary DJ codeset (Example 2). By assigning t = 2 



Table 1. The Number of Quadratic Coset Leaders for Construction (1) when t = 2 



n 


4 


6 


8 


10 1 


Theorem 2, (5),(2), |P|/2"+-’ 


72 


12960 4354560 2351462400] 


Exact Computation 


36 


9240 


4086096 2317593600] 


DJ Code 

2n + l 


12 


360 


20160 


1814400 


log2(|P|/2-+Q 


|6.2 13.7 


22.1 


31.1 


log 2 (Number of quadratics) 


6 


15 


28 


45 



we have a construction for a much larger codeset than the DJ codeset and with 
the same Hamming Distance, D = 2"“^, but the price paid is that the PAR is 
now upper-bounded by 4.0 instead of 2.0. For example, let, 
p{x) = *02:2 -I- *1*2 -f *12:6 -I- X 2 Xs + X 3 X 3 + 2;e2;5 -I- 2:52:4 + 2:32:7 -I- 2:02:1 -I- 2:52:3 -I- 2:7 -f 2:1 
Then s = (—1)^ has PAR = 1.0 under the WHT, PAR = 2.0 under the NHT, 
and PAR = 3.43 under DFT“. 



Example 4, PAR < 8.0. Set t = 3 in (1). There are now = 840 valid per- 
mutation polynomials, fj = (/o, /i, / 2 )- These polynomials map from f Z^. 
Moreover, (2^ — 1)(2^ — 2)(2^ — 2‘^)jt\ = ^ = 28 of the polynomials are degree- 
one permutations leading to quadratic forms, p{x), and can be represented by 
the following 7 permutation polynomials. 

/.^(2:o,2:i,2:2) G { 

(xo,2:i, X2), (2:0 -f X2, 2:1, 2:2), (2:0 -f 2:2,2:! -I- 2:2, 2:2), (2:0 -(-2:1-1- 2:2, 2:1, X2), 

(2:0 -I- 2:1, 2:1 -I- 2:2, 2:2), (2:0 -I- 2:1 -I- 2:2, 2:1 -I- 2:2, 2:2), {xo + X2,Xl + Xo,X2 + Xo + 2:1)} 
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Substituting for fij and gj in (1) gives a large set of polynomials with PAR< 8.0 
under all LUUTs. We now list, for this construction, all quadratic p{x) arising 
from the 7 inequivalent degree-one permutation polynomials, f-y, for one ’section’ 
of the polynomial, i.e. for L = 2, where tt is fixed as the identity permutation. 

p(x) = X 0 X 3 + X 1 X 4 + X 2 X 5 + g{x) 

p(x) = * 03:3 + X 0 X 5 + X 1 X 4 + X 2 X 5 + g{x) 

p(x) = X 0 X 3 + X 0 X 3 + X 1 X 4 + xixs + X 2 X 3 + g{x) 

p(x) = * 0 x 3 + X 0 X 4 + X 0 X 5 + X 1 X 4 + X 2 X 5 + g{x) 

p(x) = X 0 X 3 + X 0 X 4 + X 1 X 4 + X 1 X 5 + X 2 X 5 + g(x) 

p(x) = X 0 X 3 + X 0 X 4 + X 0 X 5 + X 1 X 4 + X 4 X 5 + X 2 X 5 + g{x) 

p(x) = * 03:3 + X 0 X 5 + X 1 X 3 + X 1 X 4 + X 2 X 3 + X 2 X 4 + X 2 X 3 -I- g{x) 



where g (a;) = coXoa;i-l-cia;oa: 2 +C 2 Xia ;2 + C 3 a;oXia; 2 -l-C 4 a; 3 a: 4 -|-C 5 X 3 a; 5 -|-C 6 a; 4 a; 5 -|- 
C 7 o; 3 X 4 a ;5 -|-RM(1,6), and cq,ci,...,C7 € Z-i- An upper bound to |P| can be 
computed from Theorem 2, (2), and the upper bound is compared to the to- 
tal number of quadratics in n binary variables in Table 2. As with t = 2, an 



Table 2. The Number of Quadratic Coset Leaders for Construction (1) when t = 3 



n 


6 9 12 15 


Theorem 2, (2), log2(|P|/2"+Q 


16.7 33.5 51.7 70.9 


logj (Number of quadratics) 


15 36 66 105 



exact enumeration and construction for this set remains open, due to extra ’hid- 
den’ symmetries. By assigning t = 3 we have a construction for a codeset with 
Hamming Distance, D > 2”“^ and PAR < 8.0 under all LUUTs. 

For t = 3 we can also include cubic forms in Construction (1). There are 
5040-168 _ degree 2 permutation polynomials, = (/o,/i,/ 2 ), that map 

from Z\ — >■ Z|, and lead to cubic forms, p{x). This set can be represented by 
147 degree 2 permutation polynomials which are inequivalent under variable 
permutation, and these are listed at [20]. (Along with the 7 inequivalent degree 
1 permutation polynomials, this makes a total of 154 inequivalent permutation 
polynomials for t = 3 [10,28]). Substituting for fij and gj in (1) gives a large 
set of polynomials with PAR< 8.0 under all LUUTs, and Hamming Distance, 
D > 2"“^. An upper bound to |P| can be computed from Theorem 2, (2), and 
the upper bound is compared to the total number of quadratics and cubics in n 
binary variables in Table 3. Here is an example from this codeset, where ijk,uv 
is short for XiXjXk + x^Xy. Let, 



p{x) = 034, 035, 045, 135, 145, 234, 235, 245, 367, 368, 378, 567, 568, 69A, 79A, TAB, 
89A, 345, 9AB, 03, 05, 14, 24, 25, 36, 38, 47, 58, 69, 6A, 65, TA, 75, 89, 85, 67, 78, AB 



then s = (— l)p(“^) has PAR = 4.0 under the WHT, PAR = 6.625 under the 
NHT, and PAR = 7.66 under DFT“. In all cases, PAR < 8.0. 
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Table 3. The Number of Cubic and Quadratic Coset Leaders for Construction (1) 
when t = 3 



n 


6 9 12 15 


Theorem 2, (2), log2(|P|/2"+^) 


23.6 46.3 70.4 95.5 


log 2 (Number of quadratics and cubics) 


35 120 286 560 



2.4 A Matrix Construction for All Quadratic Codes from (1) 

Each degree-one permutation polynomial, f-y from Z 2 — >■ Z 2 can be viewed as a 
t X t binary adjacency matrix. Let x = { xq , Xi, . . . , Xt-i}. We can write, 

M<t4>/^(x) = (/o(x),/i(x),...,/t_i(x)), M = {m*,i},deg(/;(x)) = 1, and 
rriij = 1 if Xi € fi{x) rriij = 0 otherwise 

The mapping is an isomorphism from the degree-one permutation polynomials 
to the General Linear Group, G = GL(t, 2), of all binary txt invertible matrices 
[13]. To construct all quadratic sequences, p{x), for a given n and t we need 
to construct all degree one permutation polynomials, fy. These can, in turn be 
constructed by generating all members of G = GL(t, 2), and this is accomplished 
as follows [1,2]. 

Definition 6 A binary txt ’transvection’ matrix, Xat, satisfies, 

Xab = {uij}, where 

Uij = 1, i = j, and i = a, j = b Uij = 0, otherwise 



Definition 7 The Borel subgroup of G over Z 2 is the txt upper-triangular 
binary matrices, B. 



Definition 8 The Weyl subgroup of G is the txt permutation matrices, W. 



Assign a fixed ordering, O, to the matrices, Xab, a < b. Let w G IT be a 

permutation of Zt and its associated txt permutation matrix. For each w, form 
the matrix product, X^, comprising all Xab which satisfy a < b = w{a) > w{b), 
where the Xab in X are ordered according to O. 



Theorem 3 [1,2] 



G = X'a,WB 



(6) 



where X'^ is any sub-product of Xyj that maintains the ordering of the Xab 
matrices in Xyj. This is the ’Bruhat’ decomposition. 



All quadratic constructions using (1) can be constructed using Theorem 3., where 

|G| = E = n-ld(2‘-20- 
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3 Graphical Representations 



Joint work with V.Rijmen [18] 

We now identify a subset of the length 2" sequence constructions of (1), where 
(— 1)P(*) exhibits a bipolar 0 binary equivalence under transform by a tensor 
product of combinations of H and 12x2 matrices. The resultant length 2" 
binary sequences can be interpreted as indicators for binary linear or nonlinear 
[n, k, d] error-correcting codes. In such cases, p{x) is closely related to a Normal 
Realisation for the Factor Graph of the associated [n,k,d\ code [7]. Let s = 
(-l)P(a’), 

Definition 9 ”H acting on i” means the action of the 2" x 2” transform, I G 
...Gl(8)H(g)I(g)...(g)I on s, where H is preceded by i 1 matrices, and followed 
by n — i — 1 1 matrices. We write this as H{i), or H{i)[s\. 



Definition 10 Let Tc, T^j- be integer sets chosen so that TcHTc-l = 0, and 
TcUTc_l = {0,1,. ..,n — 1}. This is a bipartite splitting of {0,1, ... ,n—l}. Let 
us also partition the variable set x as x — xc U x^-l, where xc = {xi\i € Tc}, 
and xc-L = {xi\i G Tc^j. 



Definition 11 Kp is the set of all s{x) of the form s(x) = where 

l'(x) = 9fc(^c)Pfc(xcv), where deg(gffe(xc)) = 1 Vfc, and where Xi G pfx), V 
i G (0, 1, ... ,n — 1}. We refer to Kp as the set of ’half-linear bipartite bipolar’ 
states. £p is the subset of Kp where deg(rfc(xc)) = 1 V/c. 



Theorem 4 [18] Let m{x) be a binary ANF. Lf s{x) G Kp, then the action 
o/riigTc^(*) on s(x) gives s'(x) = m(x). Lf s(x) G ip, then the action of 
riieT„x H{i) on s(x) gives s"(x) = m(x). s'(x) (s''(x)) is the binary indicator 
for a binary linear or nonlinear [n,n — |T|,(i] error correcting code, C. 

Theorem 4 is particularly relevant when p{x) is constructed using (1), as the 
’strongest’ members of Kp are generated as a subclass of the construction if 
deg(( 7 j) < 2, Vj. (By considering matrices other than H it is conjectured that it 
is always possible to convert a bipolar sequence, s = (—1)^, constructed using 
(1) to a binary form, even when deg(gj) > 2). If s can be transformed to a 
binary linear indicator, s', using only tensor products of H and I, then we say 
that s is ’Hl-equi valent to’ s' . 

Theorem 5 [18] The set ip is HL-equivalent to the set of[n,k,d] binary linear 
codes. 
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3.1 Examples 

Example A. Let t = 2, L = 3. Then (1) can generate, 
p{x) = XoX2 + X1X3 + X2X4 + X3X5 + X2X5 

Let Tc = {0, 1,4,5} and Tc-l = {2,3}. Applying ff(0)ff(l)H(4)H(5) (in any 
order) to s = (— gives the binary sequence, s' = m(x) = (xq + X 2 + l)(xi + 
X3 + l)(x2 + X 4 + ^){x 2 + 3^3 + 2^5 + 1), which is the indicator for a [6,2,2] binary 
linear code, C. Graphical representations for s and s' are shown in Fig 1, where 
the graph for s' is a Normal Realisation of a Factor Graph [7]. If, instead, we 
apply H{2)H{3) (in any order) to s = (— 1 )p(“’\ we get the binary sequence, 
s" = m{x) = (xo + X 2 + X 4 + X 5 + l)(a;i + X3 + X5 + 1), which is the indicator for 
a [6,4,2] binary linear code, C-'-, the dual of C. Applying H{0)H{1)H{4)H{3) 
to s' , followed by H{2)H{3), gives s" . This is the same as applying the WHT 
to s' , and it is known that binary indicators of a linear code code, C, and its 
dual, C*^, are related by the WHT [14]. 



Example B. Let t = 3, L = 3. Then (1) can generate, 

p{x) = 034, 035, 045, 134, 135, 145, 234, 235, 245, 03, 05, 14, 15, 36, 47, 58 
Let Tc = (0, 1, 2, 6, 7, 8} and Tc^ = {3,4,5}. Applying 
H{0),H{1),H{2),H{6),H{7),H{8) (in any order) to s = (— 1 )p(*=) gives, 

s' = m{x) = 

{xo + X3X4 + X3X5 + X4X5 + *3 + + 1)(*1 + X3X4 + X3XS + X4X5 + *4 + 2:5 + 1) 

X (2:2 + 2:32:4 + 2:32:5 + 2:42:5 + 1)(2:3 + 2:5 + 1)(2:4 +X7 + 1)(2:5 +2:7 + 1) 

which is the indicator for a [9, 3, 3] binary nonlinear code, C. Graphical represen- 
tations for s and s' are shown in Fig 1, where the graph for s' is a Normal Real- 
isation of a nonlinear Factor Graph. In this case application of H (3) H (4) H (5) 
does not produce the dual code, C^, but the nonlinear dual could be obtained 
by nonlocal transform over 2:3, 3:4, 2:5. 





Fig. 1. Bipolar -o- Factor Graph HI-Equivalence for Examples A and B 



Example C. The nonlinear [16,8,6] Nordstrom-Robinson binary code is HI- 
equivalent to a half-linear bipolar bipartite sequence, (— 1)^’^®\ where p{x) can 
be constructed using (1), and has ANF comprising 96 cubic and 40 quadratic 
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terms, and where \Tc\ = |2c_l| = 8. The quadratic part of p{x) is Hl-equivalent 
to a binary linear [16,8,4] code, so we can view the 96 cubic terms of p{x) as 
further ’doping’ to increase Hamming Distance, d, from 4 to 6. 

3.2 Comments 

This section has identified an important subset of Kp as a subset of the con- 
struction of (1), where a member of Kp can be transformed to a binary sequence 
under selective action of H . Conversely, this gives us a way of analysing a Factor 
Graph, by transforming it back into bipolar sequence form. A natural question 
to ask is which length 2” bipolar sequences are transform-equivalent to the best 
[n, k, d] linear and nonlinear codes? We offer offer the following conjecture. 

Conjecture 1 Optimal linear or nonlinear codes can he constructed from (1) 
if L = 2, and (—1)®-’ is, itself Hl-equivalent to an optimal linear or nonlinear 
code, Vj. But what f^^ should he chosen? 

In the next section we pose the related question: Which quantum n-qubit states 
have optimal Linear Entanglement? 

4 PAR; and Quantum ’Linear’ Entanglement (LE) 

Joint work with V.Rijmen [18] 

In previous sections our PAR metric has been measured relative to all LUUTs. 
Quantum systems require that we compute our PAR metric (now called PAR;) 
relative to all LUTs, of which LUUTs are a subset. It is argued in [18] that 
PAR; and Linear Entanglement (LE) are good partial measures of quantum en- 
tanglement. ^ Let s be a length 2" bipolar sequence. In the context of quantum 
systems we interpret (after appropriate normalisation) this sequence as a prob- 
ability density function of an n-qubit quantum state. Let s; be an element of 
s. Then |si|^ is the probability of measuring the quantum system in state i. 
We must normalise so that ^ although normalisation constants 

are usually omitted in this paper. An n-qubit state, s, contains entanglement 
if s is not a member of Gn- The definition of PAR; is then identical to Defini- 
tion 5 except that, now, |^;| does not have to equal \lj\, i.e. I is not necessarily 
unimodular. 

Definition 12 PARi{s) = 2"moa;;(|s • Z|^)) 

where I is any normalised linear sequence from the set, Gn, and ■ means ’inner 
product ’ [17,1 8]. 

^ Quantum information theorists often consider ’mixed-state’ entanglement, where 
entanglement with the environment is unavoidable [24,8]. This is similar to the anal- 
ysis of classical communications codes in the context of a corrupting channel. In 
this paper we only consider a closed (pure) quantum system with no environmental 
entanglements [6]. 
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Linear Entanglement (LE) is then defined as, 
Definition 13 LE{s) = n — log 2 {PARi{s)) 



Entanglement and LE are invariant under transformation of s by any LUT. 
Therefore PAR; is Local Unitary (LU)-invariant, and two states, s and s', related 
by a transform from LUT, are LU-equivalent. Code duality under the WHT and 
the Hl-equivalence between s and s', as discussed in Section 3, are special cases 
of LU-equi valence. One can also view entanglement invariance as a generalisation 
of code duality. 

4.1 PAR; for States from £p 

Theorem 6 [18] If s G £p, then s is LU equivalent to the indicator for an 
[n, k, d] binary linear code, and, 

PARi{s) > 2’', where r = max{k, n — k) 



Theorem 6 implies that states, s, from £p have a minimum lower bound on PAR; 
(upper bound on LE) when the associated [n,k,d\ code, C, satisfies k = [^J, 
with PAR; > 2^51. Here is a stronger result. 

Theorem 7 [18] In (1), let t = 1 and f^j be the identity permutation Vj. Us- 
ing (1), we can generate s(x) = (— for p{x) constructed using (4)- Then 
PARi{s) = 2rtl. 



Definition 14 PA{s) = 2”maa;i(|sjp) 



We now compute PA for any HI transform of a member of £p. Let s G ^p. 
Recalling Definition 10, let k = |Tc-l|, k-^ — |Tc|, and k-\-k-^ = n. Without loss 
of generality we renumber integer sets T^j- and Tc so that T^v = {0, 1, . . . , fc — 
1} and Tc = {k, k-\-l, . . . , n—1}. Let tc-L C Tqj- and tc C Tc, where h = |tc-L | 
and h^ = |tc|. Let Xtr = {xi\i G tcv}, xt = {xi\i G tc}, and x* = x^r Uxt. 
Define M to be a fc x /c-*- binary matrix where Mij-k = 1 iff XiXj G p(x), and 
Mij-k = 0 otherwise. Thus p(x) = ^et Mt be a 

submatrix of M, which comprises only the rows and columns of M specified by 
tc-L and tc. Let \t be the rank of Mt. 

Theorem 8 [18] Let s' be the result o/niet,,xUtc on s G £p. Then, 

PA(s') = 
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Corollary 1 0 < Xt < min{h,h-^), it follows that, for s G £p, PA(s') > 

2\h-h-^\ 

In general, PAR; must consider PA(s) under all LUTs. PA(s) for s G t'p is easily 
computed. Let the ’HI multispectra’ be the union of the power spectra of s under 
the action of riisT H{i), for all possible subsets, T, of {0, 1, . . . , n — 1}. 

Theorem 9 [18] PARi of s G £p is found in the PI I multispectra of s. 

Theorem 9 means that, for s G tp, we only need compute the 2” HI transforms 
to compute PAR/. If PA(s) is optimally low over the HI multispectra, then 
s' = m{x) is an optimal binary linear code when T = Tc or T = T^a. . 

Definition 15 The Weight Hierarchy of a linear code C, is a series of parame- 
ters, dj, 0 < j < k, representing the smallest hlocklength of a linear sub-code of 
C of dimension j, where dk = n, d\ = d, and do = 0. 



Theorem 10 [18] Let Sc be the indicator of an [n,k,d\ binary linear code, C. 
Let Q C {0, 1, . . . , n — 1}. Let, 



rriQ 



IQI +log 2 (A^) -n-\-k 
2 



where p, = PA(s' ) 



( 7 ) 



and s' = riteQ 77(^)['®c]- Then the Weight Hierarchy ofC is found from the HI 
multispectra of Sc, where dj = mznq|mQ=j(|Q|) 



Quantum measurement projects a system to a subsystem. This allows us to 
equate a series of quantum measurements with a series of subcodes of C. Let the 
entanglement order of a system be the size (in qubits) of the largest entangled 
subsystem of the system. A most-destructive series of j single-qubit measure- 
ments over some set of possible measurements on s produces a final state s' 
such that entanglement order(s) — entanglement order(s') is maximised. 

Definition 16 Stubborness of Entanglement (SE) is a series of parameters, (3j, 
0 < j < k' , representing smallest possible entanglement order, (3j, after k' — j 
most- destructive measurements of an n-qubit system, where [3k> = n, /3o = 0. 



Theorem 11 [18] Let s G £p where s is LU equivalent to an optimal or near- 
optimal binary linear code of dimension < Then Stubborness of Entanglement 
is equal to the Weight Hierarchy of the code. 



Corollary 2 Quantum states from tp which have optimum LE and optimum 
SE are LU-equivalent to binary linear codes with optimum Weight Hierarchy. 

The results of this section suggests the following modification of Conjecture 1. 

Conjecture 2 States with optimal LE can be constructed from (1) if L = 2, 
and (—1)®^ also has optimal LE, Vj. But what f~^. should be chosen? 
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5 Discussion and Open Problems 

We have highlighted the importance PAR plays (explicitly or implicitly) in cur- 
rent research. We emphasis four areas: 

a) Low PAR error-correcting codes for OFDM and CDMA. 

b) Highly nonlinear, distinguishable sequence sets for cryptography. 

c) Graphical construction primitives for Factor Graphs which represent good 

error-correcting codes. 

d) Classification and quantification of quantum entanglement. 

We finish with a list of a few open problems. 

— Construction (1) only provides an exact, implementable encoder if the two 
following sub-problems can be solved: 

• Provide algorithms to generate all permutation polynomials, f^, of de- 
gree /i — 1. /r = 0 is trivial. Section 2.4 provides an answer for ^ = 1. 
But, for /X > 1 the situation is unclear. 

• Given an algorithm to generate all permutation polynomials, then con- 
struction (1) only generates distinct p{x) for t = 1. For t > 1, the 
permutation, tt, induces extra symmetries which cause many p{x) to be 
generated more than once. This situation is reflected in (2), which is a 
strict upper bound for t > 1. It remains an open problem to provide an 
algorithm for t > 1 which ensures the generated p{x) are distinct and 
form the whole code. Such an algorithm would replace of (2) with an 
exact expression. 

— Construct decoders for the above codes. 

— It is considered that successful iteration on a Factor Graph requires few short 
graph cycles. This is ensured if the graph has a large girth. How does one 
construct Factor Graphs with low PAR; and large girth? 

— Provide a construction for optimally large sets, P, of pure quantum states 
such that each state satisfies a low upper bound on PAR;, and where any 
two members of P are optimally distinguishable. This problem is ’simply’ 
the LUT extension of the problem of low PAR error-correcting codes for 
OFDM and cryptography. 
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Abstract. The signature scheme SFLASH has been accepted as can- 
didate in the NESSIE (New European Scheme for Signatures, Integrity, 
and Encryption) project. We show that recovering the two secret affine 
mappings in SFLASH can easily be reduced to the task of 

revealing two linear mappings F|^ — > F|^. In particular, the 74 bits 
representing these affine parts do by no means contribute a factor of 2 ^^ 
to the effort required for mounting an attack against the system. This 
raises some doubts about the design of this NESSIE candidate. 



1 Introduction 

In [3,4] the asymmetric signature scheme SFLASH has been proposed. It is in- 
tended for the use on low-cost smartcards and has been accepted as candidate 
within the NESSIE (New European Schemes for Signatures, Integrity, and En- 
cryption) project. The secret key in SFLASH consists of a secret 80-bit string 
A, and two affine mappings s, t. However, owing to the verification procedure in 
this signature scheme knowing A is not vital for producing valid signatures, and 
this contribution shows that the affine parts of the secret affine mappings s and 
t are vulnerable to a very simple and efficient linear algebra-based attack. The 
attack makes crucial use of the fact that the secret affine mappings contain only 
{0, l}-entries, and has some conceptual similarity with the successful attacks 
against ENROOT and SPIFI described in [1]. 

We show that the public key of SFLASH alone (no signatures are required) 
is sufficient to reduce the number of candidates for the affine part of the secret 
key from 2^^ to typically 2^^. When the affine parts of the mappings s and t are 
known, breaking SFLASH amounts to revealing two linear mappings. Hence, 
although our attack does not break the system in total, we think it raises some 
doubts about the design of this NESSIE candidate. 



2 Description of SFLASH 

We restrict our description to those aspects of SFLASH which are relevant for 
explaining our attack; for a more detailed description we refer to the SFLASH 
specification [3]. 



B. Honary (Ed.): Cryptography and Coding 2001, LNCS 2260, pp. 355—359, 2001. 
© Springer- Verlag Berlin Heidelberg 2001 
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2.1 Parameters of the Algorithm 

In SFLASH three finite fields along with corresponding bijections are used: 

— K := F2 [A]/(A^ + a + 1) along with the bijection 

7t: {0,1}^ — >K 

{bo , ... , 6e) I — > X)i=o (mod X'^ + A + 1) 

— K' := 7 t({ 0, 1} X {0}®) (which is isomorphic to F2) 

— L := A[A]/(A^^ + A^^ + A^° + A^ + 1) along with the bijection 

ip: — y L 

(6 q; • ■ • j ^ 3e) ' — ^ Si=o (mod A^^ + A^^ + A^*^ + A^ + 1) 

Secret key. The secret key is comprised of three parts: 

— A G {0, 1}®°: a secret 80-bit string 

— s = {Sl,Sc)' an affine bijection — >• given by a 37 x 37 matrix 

Sl G and a column vector Sc G 

— t = {Ti^,Tc): an affine bijection — >■ given by a 37 x 37 matrix 

Tl G and a column vector Tc G 

For deriving the corresponding public key also the function 

F :L — > L 

a ^ ai28“+i 

is needed. Moreover, for a bitstring A = (Aq,... ,Xm) and integers 0 < r < 
s < m we write [A]r-_>s for the bitstring (Ar,Ar+i,... ,As_i,As). Finally, the 
concatenation of two tuples A = (Aq, . . . , Am), fj, = (/tq, . . . , /i„) will be denoted 

IM ■“ ('^O? ■ ■ ■ 5 Amj MOj ■ ■ ■ 5 Mn) ■ 

Public key. The public key is the function G : — >■ defined by 

G(A) = [t(:p-i(A(:p(s(A)))))]o^i8i. 

By construction (To, ■ • • > ^25 ) = G(Aq, . . . , Aag) can be expressed in the form 

Yo = Po{Xo,... ,A36) 

Y25 = P25{Xo,... ,Xse) 

where each P^ is a polynomial of total degree < 2 with coefficients in A'. 

For describing our attack, knowledge about the original signing procedure 
(which makes use of the secret key) is not necessary. So we omit its description 
here, and proceed with the description of the (public) verification procedure. 
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2.2 Verifying a Signature 

A signature S' of a bitstring M can be represented as a 259-bit string. For 
verifying the validity, the following steps are to be performed: 

1. Compute the 182-bit string 

V := [SHA-1(M)]o^i59||[SHA-1(SHA-1(M))]o^21. 

2. Compute 

Y := (7t([V]o->.6), 7!"([C]7_>i3), . . . , 7t([V]i75_>i8i)) e 
Y' := G(7r([S]o-s.6), 7’‘([S]7->13), • • • , 7’‘([S]252->258)) G 

3. If F = Y' then the signature is accepted, otherwise it is rejected. 

3 Attacking the AfRne Parts 

First of all it is worth remarking that the public key in SFLASH does not 
depend on the secret bitstring A. Consequently, the above verification procedure 
does not ensure that the correct value of A has been used for computing S. In 
fact, for forging a signature it is completely sufficient to find affine bijections 
s' ,t' : — >• such that the public key G{X) satisfies 

Then a valid signature S' for a bitstring M can be computed as follows: 

— compute Y G Ai^® as above; 

— append 11 random elements of AT to Y, yielding Y; 

— compute S' := [s'"^((/?"i(F-^((p(t'"^(F)))))]o^258- 

The last 11 components of the affine mapping t' are not used for checking the 
signature, thus the part of t' corresponding to these 11 components can be chosen 
arbitrarily such that the resulting t' is invertible. 

The aim of the procedure described below is to diminish the number of 
candidates for the affine parts of s', t' from 2®"^ • 2®^ to (typically) 2^^ elements. 
In particular the affine parts of the original secret bijections s and t must be 
contained in this set of candidates. The attack is based on 

Observation 1 Let v := S~[^ ■ Sq G AT'^^ and a £ K with -|- a -I- 1 = 0. 

Then s((a-|- 1) • u) = a • Sc G is a vector with coefficients a and 0 only, 
and — owing to the definition of F — the vector F{ip{s{{a+l)-v))) has coefficients 
and 0 only. 

As the multiplication Tc-ip~^{F{Lp{s{{a+l)-v)))) does not affect this property, 
the vector Tq can he read off directly from 

i{T~^{F{if{s{{a + 1) • v))))) = Tl ■ ip~^{F{ip{s{{a + 1) • u)))) -k Tc 

(via the substitution 0). 
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We want to use this observation to derive candidates for Tc and • Sc from 
the given public key G(X). Knowing the correct value of the vectors Tc and 
■ Sc we can express the secret parameters s and t in the form 



s{bo, ■ ■ ■ , ^se) 
t{bo, . . . , 635) 



known 



Tl ■ {bo, . . . , boe)'^ + Tc ■ 



known 



In other words, breaking SFLASH reduces to finding the linear mappings spec- 
ified by Sl and Tl- 

All the vectors (a -I- 1) ■ v with v G whose image under the public 

polynomials contains no a are candidates for S~[^ ■ Sc- For testing this property 
the quadratic and constant terms of the public key are of no interest ((a -I- 1) • 
(a -I- 1) and 0, 1 do not add up to a). Thus the only relevant parts of the public 
key are the linear terms. In other words, we are interested in finding all those 
elements from {0,a -I- 1}^^ that yield the zero vector when being substituted 
in the linear part of the public key. Of course, for symmetry reasons, this is 
equivalent to finding all elements from {0, or K'^'^ that yield zero upon 
substitution in the linear part of the public key. Hence, all we have to do is 
setting the linear parts of the public key simultaneously to 0; each solution 
V G of the corresponding homogeneous system of linear equations over K' 
is a candidate for 5”^^ • Se- 
ll the linear parts of the public key are linearly independent we obtain in this 
way = 2^^ candidates. Of course, if the equations are linearly dependent 

the number of solutions increases; in several hundred experiments we did with 
the computer algebra system MAGMA [2] this never happened. 

Evaluating the public key at (a-l-l)-w for a candidate v for Sj^^-Sc results in a 
vector whose only non-zero coefficients are and 1 . Moreover, the term 1 occurs 
in this vector iff there is a 1 in Tc at the appropriate coordinate. In this way we 
obtain the first 26 entries of the vector Tc corresponding to our candidate v- As 
the remaining II entries of Tc have no influence on the validity of a signature 
anyway (and hence can be chosen arbitrarily), this simple procedure reduces 
the number of candidates for the affine parts in SFLASH from originally 2’^'* to 
typically 2^^ possibilities. 



4 Conclusion 

The above discussion shows that the affine parts of the secret key of SFLASH 
are vulnerable to a very simple linear algebra-based attack. Namely, instead of 
considering 2^'^ possible affine parts, an attacker can usually restrict to only 2^^ 
candidates. For narrowing the key space in this way knowledge of the public 
key is sufficient; no signatures have to be known. Although this attack does not 
break SFLASH in total, we think it raises some questions about the design of 
this NESSIE candidate. 
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Abstract. We present a novel public key cryptosystem in which the 
public key of a subscriber can be chosen to be a publicly known value, 
such as his identity. We discuss the security of the proposed scheme, 
and show that this is related to the difficulty of solving the quadratic 
residuosity problem 



1 Introduction 

In an offline public key system, in order to send encrypted data it is necessary 
to know the public key of the recipient. This usually necessitates the holding 
of directories of public keys. In an identity based system a user’s public key 
is a function of his identity (for example his email address), thus avoiding the 
need for a separate public key directory. The possibility of such a system was 
first mentioned by Shamir [4], but it has proved difficult to find implementations 
that are both practical and secure, although recently an implementation based 
on elliptic curves has been proposed [3]. This paper describes an identity based 
cryptosystem which uses quadratic residues modulo a large composite integer. 



2 Overview of Functionality 

The system has an authority which generates a universally available public mo- 
dulus M. This modulus is a product of two primes P and Q - held privately by 
the authority, where P and Q are both congruent to 3 mod 4. 

Also, the system will make use of a universally available secure hash function. 
Then, if user Alice wishes to register in order to be able to receive encrypted 
data she presents her identity (e.g. e-mail addresss) to the authority. In return 
she will be given a private key with properties described below. 

A user Bob who wishes to send encrypted data to Alice will be able to do 
this knowing only Alice’s public identity and the universal system parameters. 
There is no need for a public key directory. 



3 Description of the System 

When Alice presents her identity to the authority, the hash function is applied 
to the string representing her identity to produce a value a modulo M such that 
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the Jacobi symbol is +1. This will be a public process that anyone holding 
the universal parameters and knowing Alice’s identity can replicate. Typically 
this will involve multiple applications of the hash function in a structured way 
to produce a set of candidate values for a, stopping when (^) = +1. Note that 
the Jacobi symbol can be calculated without knowledge of the factorisation of 
M. See for example [2]. 

Thus as (^) = +1, (pr) = (0), and so either o is a square modulo both P 
and Q, and hence is a square modulo M, or else —a is a square modulo P, Q 
and hence M. The latter case arises because by construction P and Q are both 
congruent to 3 mod 4, and so (^) = (^) = —1. Thus either a or —a will be 
quadratic residues modulo P and Q. Only the authority can calculate the square 
root modulo M, and he presents such a root to Alice. Let us call this value r. 
One way for the authority to determine a root is to calculate 

M + 5-(P + 0) 

r = a » mod M 

Such an r will satisfy either = a mod M or = —a mod M depending upon 
which of a or —a is a square modulo M . 

In what follows, I will assume without loss of generality that = a mod M. 
Users wishing to send encrypted data to Alice who do not know whether she 
receives a root of a or a root of —a will need to double up the amount of keying 
data they send as described later. 

If Bob wants to send an encrypted message to Alice, he first generates a 
transport key and uses it to encrypt the data using symmetric encryption. He 
sends to Alice each bit of the transport key in turn as follows: 

Let a: be a single bit of the transport key, coded as +1 or —1. 

Then Bob chooses a value t at random modulo M, such that the Jacobi 
symbol {^) equals x. 

Then he sends s = {t + a/ 1) mod M to Alice. 

Alice recovers the bit x as follows: 

as s + 2r = t(l + r/t) * (1 + r/t) mod M 

it follows that the Jacobi symbol = (]g) = x. 

But Alice knows the value of r so she can calculate the Jacobi symbol 
and hence recover x. 

If Bob does not know which of a or —a is the square for which Alice holds 
the root, he will have to replicate the above, using different randomly chosen t 
values to send the same x bits as before, and transmitting s = {t — a/t) mod M 
to Alice at each step. This doubles the amount of keying data that Bob sends. 
It would be useful to find a way to avoid having to send this extra information, 
but at present this is an unsolved problem. 

4 Practical Aspects 

Computationally, the system is not too expensive. If the transport key is L bits 
long, then Bob’s work is dominated by the need to compute L Jacobi symbols 
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and L divisions mod M. Alice’s work mainly consists of computing L Jacobi 
symbols. For typical parameter values (e.g. L = 128 and M of size 1024 bits) 
and depending upon the implementation this is likely to be no more work than 
is needed for a single exponentiation modulo M . 

The main issue regarding practicality is the bandwidth requirement, as each 
bit of the transport key requires a number of size up to M to be sent. For a 
128 bit transport key, and using a 1024 bit modulus M, Bob will need to send 
16K bytes of keying material. If Bob does not know whether Alice has received 
the square root of a or of —a then he will have to double this. Nevertheless, for 
offline use such as email this will often be an acceptable overhead. 



5 Security Analysis 

Clearly, one way to break the system is to factorise M. The fact that this is 
a weak link means that shared knowledge methods of generating M (see [1] 
for example) and the use of multiple authorities will be desirable. With shared 
generation of M it is also feasible to generate the exponent usg(;j to 

compute square roots in a shared fashion, so that no master secret ever needs 
to exist in a single location. 

We study the security of the system on the assumption that M has not been 
factorised. We consider first a passive attack against the generation of each bit of 
transport key and show that a weakness would lead to a solution of the quadratic 
residuosity problem. 

Suppose that there is a procedure that recovers x from s without knowing 
either r or the factors of M. We also assume a constraint on the hash function, 
that the recovery procedure takes as input the hashed identity a, and can not 
make separate use of the input to the hash. This excludes obviously weak hash 
functions, such as one whose final step consists of squaring modulo M . Under 
this hypothesis the breaking process computes a mapping 

F{M, a,s)^ x = (^) 

valid whenever s = {t + a/t) mod M for some t. 

Then consider what the value of F could be if evaluated for an a where the 
Jacobi symbol (-^) is +1, but a is not a square. In this case the Jacobi symbols 
(p) and (0) will both be -1. 

Now, if t was the value used to calculate s, there will be three other values 
tl,t2,t3 giving the same value of s. These are given by: 

tl = t mod P tl = a/t mod Q 

t2 = a/t mod P t2 = t mod Q 

t3 = a/t mod P t3 = a/t mod Q 

But as (f ) = (0) = -1, then (§) = (f ) = -(^) = -(f). 
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So, there is no unique (]g) to recover, and so F cannot return (^) correctly 
more than half the time whenever a is not a square. Hence we would have a 
procedure that can distinguish the two cases of {-^) = +1; that is determine 
whether a is a square or a non square without factoring M. This is the quadratic 
residuosity problem which is currently unsolved, and is a problem on which a 
number of other public key systems are based. 

Of course, an attacker will in practice be presented with a set of many such 
terms {t + a/t) mod M and possibly also {t — a/t) mod M for different values 
of t. It is desirable that the values of t used are independent and randomly 
distributed over the set of values consistent with the desired key value. For if 
successive values of t are related in a systematic way this opens up the possibility 
of an attack against the set of transmitted values. 

The scheme as described so far is vulnerable to an adaptive chosen ciphertext 
attack. Because the transport key is established one bit at a time, an attacker 
could take a target transmission and modify the component corresponding to just 
one bit of transport key at a time, changing it to produce a transport key value 
known to the attacker. By observing the decrypt to see whether this changes the 
transport key the attacker could recover the transport key a bit at a time. 

I sketch here an outline of how one might block such attacks. The approach 
is to add redundancy to the transport key establishment data so that only a 
small proportion of randomly chosen messages will decrypt in a valid way, and 
arrange that if the recipient is presented with an invalid message then the only 
output will be the information that the message is invalid. This should be done 
in a way that prevents an attacker devising challenges which may be of use in an 
attack. For the system described here we propose sending, suitably encrypted, 
data that will allow the t values to be reconstructed and then checked by the 
recipient, along with a cryptographic hash of those t values. This string would 
be produced separately for the two cases of square root (a and —a respectively) 
that may be held by the recipient. 
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Abstract. In this paper we describe an efficient and secure hardware 
implementation of the RSA cryptosystem. Modular exponentiation is 
based on Montgomery’s method without any modular reduction achiev- 
ing the optimal bound. The presented systolic array architecture is scal- 
able in several parameters which makes it possible to implement Com- 
paq’s MultiPrime^^ in a very efficient way. According to a developed 
performance model the influence of different parameters is investigated. 
This platform is optimised for Multiprime as an example for the RSA 
cryptosystem. In this work we give details about this scheme, which uses 
three or more factors of the composite N. Security of this scheme, related 
to this architecture is also presented. 

Keywords: Montgomery multiplication, modular exponentiation, Mul- 
tiPrime, Chinese Remainder Theorem, systolic array, performance model, 
scalability 



1 Introduction 

The basic operation in RSA cryptosystems is modular exponentiation, which is 
based on a repeated modular multiplication. So implementing this operation in 
an efficient way makes numerous application possible. 

In 1985 Peter Montgomery introduced a new method for modular multipli- 
cation. This operation is widely used in most cryptographic protocols (public 
and private key cryptography). The approach of Montgomery avoids the time 
consuming trial division that is a common bottleneck of other algorithms. His 
method proved to be very efficient and is the basis of many implementations 
of modular multiplication, both in software and hardware. In this paper we are 
interested in a hardware implementation. 

Our contribution is in combining a systolic array architecture with Mont- 
gomery based RSA implementation. This design has evolved to a secure and 
efficient cryptographic device used in different applications. 

Various definitions are possible, when introducing scalability. It is usually 
referred to as the ability to process a variety of number lengths at the same 
time. Although this platform consists of an array of fixed length it is scalable 
according to the previous definition. 
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The performance model shows two-fold behaviour. It can be shown that for 
specific modulus lengths the performance is quadratic in n. For larger modulus 
size the performance is cubic. The fixed word size causes jumps in the perfor- 
mance curve. More PEs do not always result in a better performance. Smaller 
parameters make the design more scalable in the sense that it has a better over- 
all performance. A larger number of LNAUs generally improves the performance 
but does not change the latency. In this work we describe an efficient imple- 
mentation of modular exponentiation in a systolic array architecture, as part of 
the RSA cryptography. We use the methods of Montgomery, proved to be very 
secure in hardware. Namely, we achieved the optimal bound which, with some 
savings in hardware, omits completely all reduction steps that are presumed to 
be vulnerable to side-channel attacks. The presented implementation has a world 
class performance in the sense of speed and power consumption. 

The remainder of this paper is organized as follows. Section 2 gives a survey 
of previous work on systolic arrays and Montgomery based operations in hard- 
ware. In Section 3, we give a short introduction to Multiprime and the Chinese 
Remainder Theorem. In Section 4 we introduce the architecture of the targeted 
platform and its specific properties, one of which is the scalability of the de- 
vice. Section 5 describes the performance with an example of 1024 bit RSA and 
Multiprime. In Section 6 the security of the proposed architecture is discussed. 
This includes the latest developments on the optimisation of the Montgomery 
Algorithm. Section 7 concludes the paper. 

2 Previous work 

Some of the most relevant previous contributions on modular multiplication and 
systolic arrays are reviewed in this section. The earliest work on hardware imple- 
mentation of the Modular Multiplication Method (MMM) of P.L. Montgomery is 
[5]. The authors have shown efficient use of hardware. The work of Iwamura et al. 
([8]) followed, as the first one at our knowledge presenting a systolic array per- 
forming the modular exponentiation operation using the Montgomery modular 
multiplication. This work is relevant for our (as for many others) architecture, 
which however went further in efficiency. The usual bottleneck for hardware im- 
plementations of Montgomery’s algorithm, is the fact that the number of output 
bits may exceed the number of input bits. Iwamura et al. derived the following 
bound for the Montgomery parameter: R > 2"+^ where i? = 2’’. For this value of 
R the examination of the size of the output each time the Montgomery method 
is executed, may be omitted. Here, n denotes the number of bits of N. It can be 
shown how to improve on this with the condition R > AN [4], which is according 
to work presented in [16] assumed to be the best possible bound in practice. In 
Section 6 this will be explained in detail. A scalable architecture is introduced 
in [3]. There is no limitation on the maximum number of bits manipulated by 
the multiplier, and the selection of the word size can be freely chosen accord- 
ing to desirable criteria. Our architecture is also considered scalable according 
to this definition which has various possibilities, such as doing RSA (including 
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Multiprime) and Elliptic Curves Cryptography on the same chip. However, we 
are focused more on the broader meaning of scalability, which will be explained 
further in the following section. 

In [15], which is further improved in [16], the author showed that the Mont- 
gomery exponentiation method requires no final subtraction, which is very im- 
portant for fast implementation. Another benefit is that conditional statements, 
which may be subject to side-channel attacks such as timing attack, power anal- 
ysis attack etc. may be omitted. Some other results considering a constant time 
implementations which is presumed first step towards secure hardware solutions 
are proposed in [7]. However, the result of [16] presents the best possible bound. 
This bound is also practical and implemented within our architecture. 

3 Multiprime 

In April 2000 Compaq and RSA Security Inc. announced a new patented technol- 
ogy MultiPrime^^ ([2]), as a generalization of the RSA protocol [13]. Instead 
of a modulus oi N = p*q, as in traditional RSA system, A^ is a product of three 
or more (distinct) prime numbers. The idea was that increasing the number of 
factors and using the CRT with parallel exponentiators should increase perfor- 
mance. In general, dependence of the performance of modular exponentiation 
and the length of modulus is not linear. It is approximately cubic or quadratic 
depending on the implementation, which will be discussed afterwards. The suit- 
able performance model is designed which is represented with the performance 
curve. This curve is changing from quadratic to cubic behaviour in relation with 
the length of modulus for every platform. 

However, a high level of security has to be preserved. The length of N is not 
the only relevant factor that provides it, because having smaller factors makes 
some methods of factoring more efficient. This observation limits the number of 
prime factors of N. In [2] the tradeoff between efficiency and security results in 
up to 4 prime factors of N for the modulus lengths of 2048 bits. 

4 Scalable Systolic Array 

In this section we describe the architecture of the PCC-ISES ([!]). The PCC- 
ISES is an integrated circuit with an architecture that is very suitable for mod- 
ular multiplication. Eor this modular multiplication Montgomery’s method is 
chosen and the notation is as follows: 

Mont{X, Y) = XYR-^ mod N 



4.1 Systolic array 

The design contains two identical Large Number Arithmetic Units (LNAU), each 
designed as a systolic array. This array is one-dimensional and consists of a fixed 
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number of Processing Elements (PEs). In the remainder of this paper P denotes 
the number of PEs. This architecture can be scaled to every desired configu- 
ration. A PE consists of some adders and multipliers that can process a bits 
of X and /3 bits of Y {a and ^ are not necessarily of the same length) in one 
clock cycle. So, in one clock cycle several additions and multiplications can be 
performed in all PEs which differs from relevant work of other authors. More 
precisely, a chip based on our architecture performs one loop of the multipli- 
cation algorithm in one cycle. (Other authors usually consider one addition or 
multiplication in one cycle.) In this way the two LNAUs of the PCC-ISES pro- 
vide extremely fast RSA protocols. Specific commands are defined for modular 
multiplication and exponentiation, which can be used by the ARM7 processor to 
access the accelerator. Actually, this hardware accelerator can perform two mod- 
ular multiplication operations at the same time, which provides the possibility 
of implementing algorithms in parallel. 



4.2 Scalability 

Scalabiltiy may be defined as the property that a variety of number lengths can 
be processed on the same platform. Although the PCC-ISES consists of an array 
of fixed length it is scalable according to the previous definition. If the operands 
are ’’too large” to fit in the available number of PEs the intermediate result of 
the last PE is fed into the first PE. These intermediate results are temporarily 
stored in a EIEO memory, if necessary. 

One advantage of this scalability is that different modulus lengths can be 
processed on the same hardware device without having to pad to a larger mod- 
ulus. This means that Multiprime as well as standard RSA can be used on the 
same system. 

In this paper we only discuss modular exponentiaton. However the PCC- 
ISES supports more cryptographic functionality. Eor the sake of completeness a 
short description of this platform is given. 

PCC-ISES has the following characteristics: Embedded Cryptographic Accel- 
erators with 2 LNAUs capable of performing up to 2048-bit modular arithmetic. 
Embedded microprocessor ARM7, 128 KB embedded RAM, and other features 
required for various cryptographic applications. Eor more details the reader is 
referred to [1]. 



5 Performance 

To examine the behaviour of all PCC-ISES scenarios a performance model has 
been developed. In the first part of this section this model is presented. In the 
second part the model is used to display some results for different keylengths 
and parameter settings. 
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5.1 Performance model 



Depending on the targeted application parameters of this architecture can be 
chosen. There is always a trade-off between the size of the IC and the perfor- 
mance. To understand the influences of the different parameters a performance 
model is made. The modular exponentiation is implemented as a repeated Mont- 
gomery multiplication in the sense of some of the square- and- multiply methods. 
Re-coding of the exponent is one possibility to reduce the number of multipli- 
cations but this number stays of order of the exponent length [13]. The overall 
performance is also determined by the performance of a single multiplication. 
The inputs of the multiplication are divided into words, not necessarily of the 
same length. The words of X are distributed over the PEs. When there are not 
enough PEs more rounds are needed. So the number of rounds is [^], where n’ 
is the number of words X of length a bits. In one round each word of Y has to 
pass all PEs. Each PE calculation takes one clock-cycle. So, in P clock-cycles a 
word of Y passes all PEs in the array. When the number of words of Y is larger 
than the number of PEs the EIEO memory is used to store the intermediate 
results of the last PE. Now each round costs max{P,n”) cycles, where n" is the 
number of words of Y of length /3 bits. The performance (number of clock-cycles 
for one exponentiation) is now modeled as: 



Ti 

[ — ] • max{P, n") ■ ci -n 

= \f 



( 1 ) 



Here n is the length of the modulus in bits and ci • n is the number of 
multiplications needed for the exponentiation. The value for ci is typically 1.5 
for the well known left to right square and multiply exponentiation method. 
When using exponent re-coding the value of ci is usually 1.2 for an exponent with 
about half of the bits being zero. It is easy to conclude from (1) that for modulus 
lengths with n" < P the performance is quadratic in n. The performance is cubic 
in the modulus length for larger modulus sizes. The fixed word size of a and /3 
bits (the ceiling function in the model) causes jumps in the performance curve 
(see Eigure 3). 

The model shows that more PEs is not always the best option. The minimal 
number of PEs requiring only one round for a certain modulus length seems to 
be the most efficient solution. On the other hand, the number of PEs is restricted 
because of maximal implementation complexity. The performance can be further 
improved by implementing more than one LNAU. The LNAUs perform expo- 
nentiations simultaneously and separately, so the performance is proportional in 
the number of LNAUs. 



5.2 Performance figures 

When designing a new platform, many factors should be taken into account. Eirst 
of all, the targeted modulus lengths should be addressed. Different modulus 
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lengths behave differently regarding all parameters involved. The parameters 
should be chosen in such a way that they are in overall performance the most 
beneficial for all modulus lengths. 

Let us consider an RSA encryption protocol with 1024 bits for 2 or 3 prime 
factors. In the first case p and q are 512 bits long and for the Multiprime case 
p, q and r are each 342 bits long. The influence of the number of PEs on the 
performance in both of these cases is displayed in Figure 1. 



Influence of the number of PEs 




#PEs 

-■-342 -*-512 



Fig. 1. Influence of the number of PEs 



In this graph the peaks represent the extreme values of the ceiling function 
for the number of rounds. For example the maximal value between 5000 and 6000 
encryptions per second stands for the beginning of the interval in which, for the 
given modulus size, only one round is needed. The second peak in the range of 
5000-6000 marks the beginning of the interval for which 2 rounds are required. 
When one is considering the most efficient (standard) RSA implementation the 
best choice for the number of PEs would be 32 or 16 as the performance curve 
has a maximum for these two values. In the case of Multiprime the preferred 
choice is 22 or 11 PEs. Having in mind doing both types of RSA cryptography 
on the same platform both curves should be taken into account (Figure 2). In 
this case the optimal performance is achieved for any of the following numbers: 
32 and 16 PEs. 
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number of PEs 

Fig. 2. Sum of the performance-curves for 2 and 3 primes 



Figure 3 shows the performance curve for a modular exponentiation on a 
LNAU with 23 PEs and a = ^ = 16. The effect of the ceiling functions can be 
observed as jumps in the graph. 

Similar behavior of the performance figures can be found in [3]. 

Table 1 presents the timings of the core exponentiations for various modulus 
lengths. 





mod. exp. 


1024 bit N 


2048 bit N 


342 bits 


1400 


467 


- 


512 bits 


731 


365 


18 


683 bits 


448 


- 


149 


1024 bits 


167 


167 


84 


2048 bits 


22 


- 


22 



Table 1. Timings on the PCC-ISES with 2 LNAUs and clock-frequency of 50 Mhz. 



In the case of 1024 bit RSA, Multiprime will be faster with a factor of ap- 
proximately 1.27, since the expected timing for three 342 bit exponentiations 
is 467 per second and for two 512 bit exponentiations is 365 per second. These 
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Fig. 3. Performance curve for 23 PEs and its trendline. 



timings are not including overhead caused by the re-combination algorithm of 
CRT. 



6 Remarks on bound and security 



6.1 Bound for the Montgomery parameter R 

In [16] the need of avoiding reduction after each multiplication is addressed. In 
practice this means that the output of the multiplication can be directly used 
as an input of the next Montgomery multiplication. We want to find a bound 
on R such that with X,Y < 2N the output of the Montgomery multiplication 
T < 2N. Write R > kN, then: 



^ XY+mN XY 

^=— R— = 



(2) 



where, m = {XY mod R)N' mod R. 

Hence, T < 2N for fc > 4, implying: 4A^ < R. To guarantee the existence 
of the modular inverse of R, R and N should be relatively prime. This excludes 
4N = R. 

The result of a Montgomery multiplication XYR~^ mod N < 2N when 
X,Y <2N and R > 4N. 

This is the same result as disclosed in [16]. The final round in the modu- 
lar exponentiation is the conversion to the integer domain, i.e. calculating the 
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Montgomery function of the last result and 1. The same arguments as above 
prove that this final step remains within the following bound: Mont{T, 1) < N. 
In practice, mod N = N will never occur since A ^ 0. 

6.2 Security issues 

The security of the RSA crytosystem depends on the difficulty of factoring the 
modulus N. All known factoring algorithms are either having running time de- 
pendent on the size of the factors or only on the size of the modulus N. Hence, 
when looking at the security of Multiprime one should be especially concerned 
about the first type of factoring algorithms. The currently fastest algorithm of 
this type is the Elliptic Curve Method (ECM). It’s asymptotic running time is: 
exp{0{{ln(p))i ■ {lnln{p))^)) [14], Eor example, the ECM finds a 167-bit factor 
of a 768-bit number with the probability 0.63 after spending 6200 Mips- Years, 
under the assumption that such a factor exists for a two prime modulus ([12]). 
However, if there are more than two primes, this probability is even higher. 
Therefore, suggested lower bound for RSA keysizes as in [12] should be carefully 
reconsidered in the case of Multiprime. 

When considering side channel attacks, most attention is directed to timing 
([6]) and power analysis attacks ([11]). Ever since P.Kocher introduced a new 
type of attack ([10]), the so-called Differential Power Analysis attack (DPA for 
short), reasonable amount of research has been done on this subject. The same 
author published in 1996 ([9]) one of the relevant papers on time-difference 
based attacks. Namely, computations performed in non-constant time (usually 
because of performance optimisations) may leak secret key information. This 
observation is the basis for timing attacks. On the other hand, power analysis 
based attacks use the fact that the power consumed at any particular time 
during a cryptographic operation is related to the function being performed and 
(possibly sensitive) data being processed. 

The benefit of using the method of Montgomery for multiplication is evident 
for all types of algorithms. The reason for that is the following: a modular reduc- 
tion step may also be vulnerable and in the case of MMM at most one modular 
reduction is introduced for every multiplication or per step of exponentiation. 
In our implementation even these reductions are excluded. By use of an optimal 
upper bound the number of iterations required in the MMM can be reduced 
([17]). More precisely, some savings in hardware that have been included in our 
architecture avoid the conditional statements while performing the exponenti- 
ation. In that way, our implementation of modular exponentiation operates in 
constant time, which is presumed to significantly reduce the potential risk of the 
timing attack. However, while using CRT the modular multiplication in general 
may need a final reduction. 

7 Conclusions 

We have described one application of RSA cryptosystem in hardware by the use 
of a systolic array architecture. The used method for modular multiplication is 
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the one of Montgomery, which is proved to be more secure in hardware when con- 
sidering timing attacks. We showed the possible gain of Multiprime comparing 

to standard RSA which is related to the platform. 
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Abstract. In this paper the authors present a statistical test for testing the strict 
avalanche criterion (SAC), a property that cryptographic primitives such as 
block ciphers and hash functions must have. Random permutations should also 
behave as good random number generators when, given any initial input, its 
output is considered part of a pseudorandom stream and then used as an input 
block to produce more output bits. Using these two ideal properties, we con- 
struct a test framework for cyptographic primitives that is shown at work on the 
block cipher TEA. In this way, we are able to distinguish reduced round ver- 
sions of it from a random permutation. 



1 Introduction 

Many cryptographic primitives (a cryptographic hash function, a block cipher with any 
fixed key, etc...) need to behave as random permutations to be considered usable. In 
fact, any method of distinguishing them from a random permutation is considered a 
weakness and the result of a successful cryptanalysis, even though the results may not 
be directly used in the recovery of plaintext or key bits. In this paper, we will present a 
method to show how reduced round versions of the block cipher TEA can be distin- 
guished from a random permutation. 



1.1 Testing for the Strict Avalanche Criterion 

Any random permutation / must have the SAC property, which we can verify with a 
statistical test. If / has the SAC, then it should pass the test that this pseudocode de- 
scribes: 
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While (#Texts<NUMTEXT) 

{ Randomly generate an input vector V; 

Randomly choose a position p in V; 

Flip the content of the position p in V to get V' ; 
Calculate h the hamming distance between f (V) and 
f (V' ) and increase the number observed_values [h] ; 

}; 

Compute the Chi-Square statistic t over the results 
stored in observed_values [h] . 

Perform a hypothesis contrast to test if the observed 
distribution (the t value) is consistent with the ex- 
pected probability distribution, that should be a Bino- 
mial (n, 0.5) being n=length of the output of f. 

1.2 Random Permutation Generate Random Numbers 

Additionally, if / is a random permutation, independently on the initial vector I„=IV 
the recurrence relation: 



I„=IV 

L,=f(U 

must produce a sequence (/,, I^, of essentially random numbers. 

1.3 Testing SAC with Autofeeding 

This sequence of random numbers can be seen as a stream of random bits, that could 
be used, for example, to generate V and p in the SAC test described above, a technique 
we call autofeeding. In this way, we will essentially double-check the strength of the 
cryptographic primitive. This can be used to mount a framework to test functions. 

In this way, any result statistically distinguishable from the expected will prove that 
the /function has not SAC or is not a good random number generator. In any case, it 
will be a method of effectively distinguishing / from a random permutation and im- 
plies/should be considered not adequate for cryptography. 



2 Results 



We have tested reduced round versions of the block cipher TEA with the procedure 
described above. For this particular case, the output of TEA has length 64 bits, so the 
hamming distance between V and V’ is a random variable that should take any value k 
between 0 and 64 with probability 



64! 

kl(64-k)!2^^ 
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that is, the distribution of the hamming distance between V and V’ should be a bi- 
nomial with n=64 and p=0.5 

To test if this is consistent with the observed results, we performed 10 tests and cal- 
culated its average, as seen in the tables below: 

Table 1. TEA with 1 round (average of 10 tests) with autofeeding 



# Tests 


Average 


2“ 


29.93 


a 


106.41 


2* 


240.65 



Table 2. TEA with 2 rounds (average of 10 tests) with autofeeding 



# Tests 


Average 


t 


29.75 


a 


104.19 




231.61 



Table 3. TEA with 3 rounds (average of 10 tests) with autofeeding 


# Tests 


Average 


2'’ 


16.91 


2’ 


55.66 


2* 


163.55 


2'’ 


498.02 



Table 4. TEA with 4 rounds (average of 10 tests) with autofeeding 



# Tests 


Average 


2® 


3.20 


2’ 


10.20 


2* 


22.79 


2'’ 


32.01 


2“ 


72.62 


2" 


118.33 


2'" 


248.20 
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Table 5. TEA with 5 rounds (average of 10 tests) with autofeeding 



# Tests 


Average 


t 


5.32 


a 


12.06 


t 


17.20 


't 


19.33 


2“ 


19.24 


2" 


23.58 


2'" 


21.17 


2'" 


22.90 


2'^ 


23.99 


2'" 


26.65 


216 


26.50 


2" 


27.89 


2‘* 


30.22 


2'’ 


31.15 


220 


38.75 


2“ 


42.42 


2“ 


53.15 


223 


66.75 


2“ 


92.50 


2“ 


132.56 



All the tables conclude approximately when the average of 10 different tests is 
higher than 93.216 (in light grey), which is the chi-square statistic for an alpha of 99% 
with 64 degrees of freedom. 



3 Conclusions 

As seen in the tables above, we can distinguish the block cipher TEA from a random 
permutation for up to 5 rounds. As expected, increasing the number of rounds also 
increases the number of encrypted texts needed to observe a significant deviation from 
the ideal behaviour. 

Currently we are working in 6 rounds TEA and we feel quite sure we will be also 
able to distinguish it from a random permutation. This is probably the limit of this 
approach on the TEA block cipher. Eor 7 rounds or more, we believe this approach is 
not capable of distinguishing TEA from a random permutation, at least in this form. 

Anyway, we feel our proposal is useful and will probably produce other interesting 
results when used to test other cryptographic primitives. 
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Abstract. This paper presents a cryptanalysis attack on the RSA cryptosystem. 
The method. Multiple Residue Method (MRM), makes use of an algorithm 
which determines the value of (f>(n) and hence, for a given modulus n where 
n = pXq , the prime factors can be uncovered. This algorithm calculates and 
stores all possible residues of p, q and {p + q) in different moduli. It then ap- 
plies the Chinese Remainder Theorem (CRT) to different combinations of resi- 
dues until the correct value is calculated, [6]. Further properties in relation to 
this structure show that improvements in the search process, within the residue 
of all parameters involved, can be effectively achieved. Besides, it has been es- 
tablished that the security of the RSA is no greater than the difficulty of factor- 
ing the modulus n into a product of two secret primes p and q. However, the 
MRM approaches the factorisation problem from a different angle. This method 
is aimed at finding towards the ^(n) in 0(2"^ Xn) , where j is the number of 
prime moduli. It may also be directed towards the computation of the 
sum {p + q) and, in the realistic case for the RSA, reduces to 0{2~^ X-Jn ) . 



1 Introduction 

The concept of public-key cryptography, introduced in 1976 by Diffie and Heilman, 
provides a proper solution to the problem of key distribution. Their scheme makes use 
of the apparent difficulty of computing logarithms over finite Galios finite field GF(p) 
where p is a prime [1]. Since then, many implementations of public-key cryptography 
have been proposed. One example is the RSA scheme. In the RSA public key crypto- 
system, [2], two large primes p and q are chosen randomly to give n where n = pXq . 
An encryption key, e, is the inverse of the decryption key, d, mod ^(n) . The decryp- 
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tion key, e, is selected from the interval [2, (^(m)- 1], where <j){n) is the Euler totient 
function of n and is given by 

(j){n) = {p-\){q-\) (1) 



e and must be relatively prime. Also e and n are made public but the decryp- 
tion key, d, is private and is chosen such that 

exd = \{mo A (j){n)) (2) 

In practice p and q are chosen to be 0{4n ) and therefore general factorisation 
methods tend to be more successful than specific ones. These two terms, general and 
specific, will be explained in the next section. 

It is generally assumed that the system’s security relies on the difficulty of factor- 
ing n. Nevertheless, given the factorisation problem and the length of n, of the order of 
200 digits, it is equally hard to compute the value of given n, [2]. Although some 
of the attacks on the RSA consider finding the value of d by studying different ci- 
phertexts, some others, on the other hand, are aimed at evaluating the two prime fac- 
tors, p and q. One such attack can be used to factor the modulus when the prime fac- 
tors of either p — \ or q — \ are all small [3]. Similarly, another attack can also factor 
the modulus when the prime factors of either p + \ or ^ -I- 1 are small but using a 
different method [4] . 

In this paper a new search pattern geared towards finding (p + q) and/or for 
a given modulus n is presented. Each time a combination of the residues of {p + q) 
and/or are selected (from different prime fields), have CRT applied using pre- 
computed tables, and are then checked against the square root function in order to find 
the correct value of {p + q) and/or (^(n) . It has been investigated that there are dif- 
ferent search patterns through all possible residues that minimise the search effort: one 
of these search pattern is discussed in the following sections. 

Since the RSA system was introduced there have been many attacks aimed at dif- 
ferent weaknesses in the system. To nullify those attacks some constrains have been 
put on the system selection of parameters of the system. Nevertheless, a general 
method based on the multiple residues of all parameters involved in the RSA has been 
introduced [6]. This method discloses the secret parameters of the RSA system re- 
gardless of how carefully they may have been chosen. The method is based on the fact 
that the factorisation problem is effectively decomposed into an arbitrary number of 
shorter finite field factorisations of the different residues of n. It is apparent that the 
properties of both (j){n) and {p + q) may be exploited in terms of the reduced number 
of possible combinations of residues. The computation of this algorithm can be readily 
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mapped onto a parallel processing architecture and tables of precomputed residues 
will avoid performing any real-time factorisation operation. 

The proof of this structure is based on a Multiple Residue Method (MRM) which 
relies upon the application of various number theoretic concepts and properties of the 
prime numbers, moduli, congruences, and the Chinese Remainder Theorem (CRT) 
[6]. Before discussing the search pattern through the MRM it is essential (necessary) 
to introduce this method. 



2 The Multiple Residue Method 

An odd positive integer factoring algorithm can be loosely classified as being either 
specific or general, [1]. A specific method uses some special property of the factors 
being searched for and first discovers those which possess that property. For instance, 
the divide and factor method, [1], and Pollard p-method, [2], will usually find factors 
which are small before those that are large. The Pollard p-l method, [2], will con- 
verge towards a factor p for which the prime factors of p-l are small. On the other 
hand, in a general method, Multiple Residue Method (MRM), the probable computa- 
tion time to find the factors is independent of their magnitude. For example, in a per- 
fect general method as much time will be spent in finding the small factors as in find- 
ing any of the other factors. 

The basis of the Multiple Residue Method (MRM) is to write the residue of an odd 
composite number, n= pXq , in terms of the residues of p and q, in different prime 
fields 



(«)mi =<Pll-^ll)mi 

(«)m, =(P21-^2l)m, V ... V <P2,-g2,, )m, 

(«)m, =(Pkl-<lkl)m, 

(«)m, (3) 



where {n)^^ denotes the residue of n modulo , and nij is the prime, starting 

from 2, 3, 5, ... Also i^. is the number of residue pairs corresponding to the k‘^ prime. 
Equation 3 may then be used to calculate the values of residue pairs for different 
moduli, examples of which are shown in Tables 1, 2, 3 and 4. 




A New Search Pattern in Multiple Residue Method (MRM) 



381 



Table 1. Combinations of the residue pairs in different prime moduli, nij = 2 



<M>2 


{p^q)2 


1 


(1, 1) 



Table 2. Combinations of the residue pairs in different prime moduli, m j = 3 



(«>3 








1 


(1, 1) 




(2, 2) 


2 


(1,2) 




- 



Table 3. Combinations of the residue pairs in different prime moduli, tiij = 5 



<«>5 




<P,4>5 




1 


(1, 1) 


(2, 3) 


(4, 4) 


2 


(1,2) 


(3,4) 


- 


3 


(1,3) 


(2, 4) 


- 


4 


(1,4) 


(2, 2) 


(3, 3) 



ble 4. Combinations of the residue pairs in different prime moduli, mj = 


(n)j 






{p^q)i 




1 


(1, 1) 


(2, 4) 


(3,5) 


(6, 6) 


2 


(1,2) 


(3, 3) 


(4, 4) 


(5, 6) 


3 


(1,3) 


(2, 5) 


(4, 6) 


- 


4 


(1,4) 


(2, 2) 


(3,6) 


(5, 5) 


5 


(1,5) 


(2, 6) 


(3,4) 


- 


6 


(1,6) 


(2, 3) 


(4, 5) 


- 



Also from Equation 3, a similar structure is derived for (^(n) and (p + q) , the result 
of which is stored in a number j of residue tables such as that shown in Table 5 for a 
modulus of 7. 
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Table 5. Residues of {p + q) in mod 7 



<M>7 




{p+q)i 




1 


2 


6 


1 


5 


2 


3 


6 


1 


4 


3 


4 


0 


3 


- 


4 


5 


4 


2 


3 


5 


6 


1 


0 


- 


6 


0 


5 


2 


- 



To reach the actual value of (p + q) using these tables, all potential values for 
(p + q) are first evaluated by applying the CRT to appropriate combinations of these 
values from the tables. Each combination consists of a set of elements, each of which 
is drawn out of a separate residue table. 

The values of (p + q) should be determined by checking the solution of the fol- 
lowing equation for a perfect square 

p-q=^(p + qf 



This task can be effectively carried out by using the square-rooting algorithm intro- 
duced in [8]. 

The same technique can be applied to possible residue of p-1 and q-l, for a 
given modulus n, to obtain possible values of the Euler totient function, ^(n ) , in order 
to determine the correct 0(n) and thus break the RSA cryptosystem. Similarly the 
value of ^(n) can be checked using the following equation 

I ^ ( 5 ) 

p-q = J(n-l)-^(n) -4(fi(n) 



Elowever, the laborious part of this method is finding the right combination of resi- 
dues. In the following section improvements are made to reduce the computational 
effort required to determine (p + q) or ^(n ) , particularly the search pattern through 
the residues. 

To determine the computational effort required in finding the actual value of 
(p + q ) , we need to estimate the number of possible combinations, C. By referring to 
the precomputed Tables 1, 2, 3, and 4 it can be seen that the number of possible resi- 
dues in each row of this table is 

-l)/2 < (^ < -l-l)/2 (6) 



Hence, 
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c = 






(7) 



where + indicates the maximum and minimum number of elements in different 
prime moduli. However, the results of factoring several large random numbers show 
that in practice C is given by 



j 

C<2-WY\(m, - 1 ) 

k=\ 



( 8 ) 



Or 



C <2 ^ XmX S 



(9) 



In order to calculate the value of C, the value of S is computed first by using Mer- 
ten’s theorem [7], This states that 



s= n n 

2 < m^. < tfij 



1 ^ 0.5 

OTjj In rrij In nij 



( 10 ) 



where y is Euler’s constant given by 

Y = lim . . . +— !— -Inm .) 

2 nij 



( 11 ) 



Therefore, the maximum number of combinations required to evaluate (p + q) is 

( 12 ) 



C<2~^x- 



\nm : 



In order to carry out the square rooting operation given in Equation 4 the overall 
modulus given by 



m <Y[”^k 

k=l 



(13) 



needs to exceed (p — q) which in the case of the RSA is of 0{4n ) . Hence 



C <0(2~^X 




Innij 



( 14 ) 
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Table 6 shows the max number of possible combinations for different orders of n. 



Table 6. Order of operations for a given n 



0(n) in digits 


Maximum O(operations) 


10 


0.3 X 10' 


20 


1.8 X 10' 


50 


4.7 X 10“' 


100 


1.0 X 10'" 


200 


1.6 X 10'' 



3 Search Pattern 

A comparison study has been made between the MRM algorithm and Pollard’s p-\ 
method [3]. In this comparison, composites are selected such that some of them favour 
Pollard’s p — l method and some of them do not. The desirable composites for Pol- 
lard’s p-l algorithm are those for which p has small factors. The undesirable com- 
posites are those for which p-l is equal to 2x p\ where p’ is a large prime, and 
also those for which p - 1 is equal to the product of some small primes raised to large 
powers, i.e. p - 1 = 2“ x 3^ x 5^ x • • • , where a and (3 are very large and y is large. 

However, all the arithmetic operations involved in MRM algorithm are performed 
on numbers of size n. Since the labour of multiplying or dividing large numbers nor- 
mally increases with the square of the length of the numbers involved, this means that 
one cycle of the MRM algorithm is roughly about 5 times as fast as one cycle of Pol- 
lard’ s algorithm. 

The algorithm requires very little space to store the precomputed tables of possible 
residues in different moduli and these tables could be easily mapped onto a parallel 
processing architecture. Another important feature of the MRM algorithm is that the 
only possible outcomes are either the number is composite and its factors are deter- 
mined or the number is prime. This is unlike some other general factorisation methods 
where infinite loops may occur revealing nothing about the number. 

Furthermore, one may question the possibility of reducing the number of combina- 
tions. If so, improvements may be possible in order to speed up this method. 

It was shown that the MRM algorithm is comparable to Pollard’ s p — \ method for 
case when composite number does not favour Pollard’s method. The MRM algorithm 
can go through the residues of p-l and q-l in different prime fields in a pseudo- 
random manner. This is based on an empirical observation, which shows that it is 
unlikely for (j){n) to have the same residue in four or more consecutive prime moduli. 

It is also shown that (j)(n) can not have factors over the entire prime moduli, rrij , or 
more than two or three consecutive primes in the prime moduli. However, the search 
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for the value of using the MRM can be carried out in an organised fashion ac- 
cording to various strategies, the purpose of each strategy being to find the right resi- 
dues of p-\ and q-\. For instance, the search for the right residue of p-\ and 
17 -1 could start from the left side of the final table, shown in Fig. 1, (in which all 
possible residues of p-1 and 17 -1 are stored in different prime field knowing n, 
circles indicate residues). In the final table of the MRM algorithm a different route can 
be selected for the search through values of 0{n) because all the routes are independ- 
ent of each other. For instance, if the search starts from the routes in which p-1 and 
q-l, have small factors then this method becomes similar to Pollard’s p-1 method. 

Furthermore, a high level of efficiency can be achieved by parallel implementation 
of MRM, due to the fact that each combination of residues (each route) can be com- 
puted and tested entirely independently of each other. Therefore, MRM algorithm can 
be computed according to the degree of parallelism used. It means that the algorithm 
can be calculated in semi parallel fashion in which the final part of the algorithm that 
contains independent instructions are grouped in k groups, where k is the number of 
processors. 



o o 

• 00 

0*00 
o o o • o 

0 0 0 0 0 *0 

Number of residues in different primes 
Fig. 1. Illustrating possible residues of {p + q) in different prime fields. 

Moreover, the search can start from the combination of elements most likely to be 
the correct ^(n) . This method of searching involves going through the combinations 
having the largest probabilities of being 0(n) down to the combinations with least 
probability. For example, see Figure 1 which illustrates the method of searching the 
various combinations of residues in different prime fields. Dark dots indicate a selec- 
tion of highest probability of (p + q) . From Figure 1 the lowest probability of (p + q) 
can be determined by selecting the residues combination all from column one. 

The algorithm requires very little space to store the precomputed tables of possible 
residues and these tables could be readily mapped onto a parallel processing architec- 
ture. It would thus constitute a general factorisation method since success in factoring 
a given number, n = py.q , would then be independent of the size of the factors. 



2 

3 

5 

7 

11 

13 

17 




386 



S.J. Tabatabaian et al. 



4 Conclusion 

In this paper suggestions have been made in order to improve algorithm based on the 
multiple residue method. The MRM algorithm has been introduced as a general ap- 
proach to break the RSA system and, more generally, in the factorisation of a modulus 
n which is the product of two primes p and q. This method decomposes the factorisa- 
tion problem into arbitrary number of n, where n = pXq and makes use of the CRT 
to evaluate the Euler totient function or the sum of the two primes. 

It was discussed that the MRM algorithm is comparable to Pollard’s p-l method 
for the case when the composite number does not favour Pollard’s method, i.e. p-l 
has a large prime, where n = pXq and p-l is a product of 2x p’ . This is because 
the MRM algorithm goes through the residues of p-l and q-1 in different prime 
fields in a pseudo random manner according to various strategies, the purpose of each 
strategy being to find the right residues of p-l and q-1 . 

It was also shown that because of the inherently parallel nature of the MRM algo- 
rithm, it is easily possible to group the residues by allocating the task of each group to 
a processor. Each processor then evaluates (pin) from the combination of residues 
according to its own defined search strategy. Consequently, not only is the order of the 
algorithm reduced but also the efficiency of search is increased. 

Experimental results show that upto 95% success rate is achievable using the MRM 
of determining ^(n) . This is deemed possible when analysing a pre-selected range of 
residues through the parallelism technique across small to large prime number (/). 
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Abstract. This paper proposes a new undeniable signature scheme which is 
based on Chaum-van Antwerpen undeniable signature scheme. In this paper, we 
extend Chaum-van Antwerpen undeniable signature scheme to take into account 
smart cards, which are famous for their tamper-resistant feature. And, we also 
add additional process of authenticating signer to verification and disavowal 
protocol of Chaum-van Antwerpen undeniable signature scheme. By these 
modifications, attempts to repudiate or deny a valid signature can be prevented 
or detected with higher efficiency. With authentication and smart cards, our 
scheme can be used for settling up disputes over forgeries of signatures, signa- 
ture repudiations or fraudulent claimants. 



1 Introduction 

A digital signature of a message is a number, which is dependent on some secret in- 
formation known only to the signer, like signer’s secret key, and additionally on the 
content of the message to be signed[l]. Digital signatures have to be verifiable; so if a 
dispute arises as to whether a party signed a document which is caused by either a 
lying signer trying to repudiate his valid signature, or a fraudulent claimant, an impar- 
tial third party should be able to resolve the matter equitably, without requiring access 
to the signer’s secret information or private key. 

Generally, digital signature schemes consist of signing algorithm and verifying algo- 
rithm. That is, in common digital signature schemes, a signer participates only in 
making his signature, and the verification is able to be performed without the signer’s 
cooperation or notification. Accordingly, care has to be taken to prevent a signed 
digital message from being reused, or from being forged by analyzing signing algo- 
rithm. And, signers’ repudiating their valid signatures and their fraudulent claimants 
must be prevented, too. 



' Sponsored and supported by Mobile Network Security Technology Research Center, Kyung- 
pook National University, Korea. 



B. Honary (Ed.): Cryptography and Coding 2001, LNCS 2260, pp. 387-394, 2001. 
© Springer-Verlag Berlin Heidelberg 2001 




388 



L. Jongkook et al. 



Attacks based on analyzing algorithm or weak point of protocol, can be blocked by 
using more powerful and secure crypto algorithm and protocol. However, digital sig- 
nature schemes can be attacked by signers’ intentional denial or repudiation of their 
valid signatures, because digital signatures can be copied or verified without signers’ 
approval or notification. To provide functionality beyond authentication and non- 
repudiation, they combine a basic digital signature scheme with a specific protocol, in 
most instances. 

To prevent signers’ repudiation or denial, Chaum and van Antwerpen introduced un- 
deniable signature schem, which consists of signing algorithm, verification protocol 
and disavowal protocol[l][2][3]. In this scheme, no signature can be verified without 
the signer’s cooperation and notification. Accordingly, signers can’t repudiate or deny 
their valid signatures, because they participate in verification of their signatures. How- 
ever, this scheme is still based on computation, so an invalid signature is accepted as a 
valid signature, or denial of valid signature is computationally possible by, with a very 
small probability. 

Accordingly, we propose a new undeniable signature scheme based on Chaum-van 
Antwerpen undeniable signature scheme, to lower or remove the probability of ac- 
cepting a wrong signature as valid. In our scheme, signing process is unchanged, how- 
ever, verification protocol and disavowal protocol are reinforced to make our scheme 
more reliable, by adding authentication using smart cards. Of course, our scheme is 
still based on computation, accepting a wrong signature as valid or denial of valid 
signature is still possible, with still less probability. 

The rest of this paper is organized as follows. Notations and assumptions related to our 
scheme are shown in section 2, and we present our new scheme in section 3. Then, we 
describe and analyze our new scheme in section 4. And, the conclusion is presented in 
section 5. 



2 Notations and Assumptions 

In this paper, S represents a signer, V is a verifier and ADV is an adversary in our 
undeniable signature scheme. All signatures have to be verified by V. Let p = {2q H- 1) 
be a prime number such that 17 is a prime number. The discrete logarithm problem in 
is assumed to be computationally infeasible[4]. a which belongs to Zj, is an ele- 
ment of order q. Let 1 < a < ( 17 -I) and (3 = d mod p, where a is S’ secret value. ID 
means identification information, which can consist of like S’ secret information and 
unique information of S’ smart card. is a secret key maintained by the V s system. 
And, we assume that a pseudo-random number generator, or PRNG shortly, exists and 
is available in L’s system. A-^B:M means A sends a message M to B, and all data 
transmission is done in cipher. Smart cards are tamper-resistant, so no one can get the 
contents of smart cards by improper ways. If smart cards are removed while verifica- 
tion protocol or disavowal protocol is proceeding, those protocols must be stopped. 
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3 A New Undeniable Signature Scheme Using Smart Cards 

In our scheme, p, a and P, or public elements, are publicly known to S and V who 
participate in our scheme. To use smart cards in our scheme, and to allow S to use his 
smart card, we store those public elements on S’ smart card. Accordingly, if S who 
wants to make his signature, has to insert his smart card into the card reader, because 
we store all public elements which are used to make his signature, on his smart card. 
Our scheme consists of four steps: registration, signing, verification and disavowal 
step. In undeniable signature scheme, verification protocol and disavowal protocol are 
important, because V has to prevent S from their denial of his valid signatures, and V 
from accepting forgeries as valid signatures, by using those two protocols. The fol- 
lowing is a detail description of all steps in our scheme, from registration step to dis- 
avowal step. 

Registration step: To register, S has to submit his ID to U’s system. Then U’s system 
calculates the registration information liEG for S as Fig. 1 : 



REG = (IDf^ mod p 



Fig. 1. Description of registration algorithm 

After calculating REG, V sends registration information for S, REG to S by secure 
manner. Each REG is stored on Ks system and each S’ smart card, and used to 
authenticate each S. By registration, V can determine S is valid user or signer of U’ s 
system, or not. This registration step can be done when S’ smart card is issued by V, or 
registration V’s system to S’ existent smart card, for confidentiality. Accordingly, 
ADV can’t get S’ registration value, REG from S’ smart card, because ADV can’t ac- 
cess the content of S’ smart card. This step differentiates our scheme from Chaum-van 
Antwerpen undeniable signature scheme. 

Signing step: Signing algorithm is so simple and remains in unchanged from Chaum- 
van Antwerpen undeniable signature scheme. However, it is important that a signature 
must be generated inside of S’ smart card. And, all public elements needed to our 
scheme are stored on S’ smart card, except S’ secret information, like S’ secret key or 
signing key, which must be input manually when signature is made. Signing is per- 
formed as Fig. 2: 



1 . S calculates y = x“ mod p, where x is a message to be signed. 

2. S—>V: (x, y) 

Fig. 2. Description of signing algorithm 



Verification step: This step is similar to the verification protocol of Chaum-van Ant- 
werpen undeniable signature scheme. That is, verification is done with S’ cooperation. 
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However, in our scheme, S has to be authenticated by V. This authentication differen- 
tiates our scheme from Chaum-van Antwerpen undeniable signature scheme. Fig. 3 is 
detailed description of verification protocol: 

1. V^S: r„ where e G, is random number, timer or nonce, which is 
chosen by V, 

2. S—>V: (z, t), where z = {r^f mod p, t =f{r„ REG) mod (p-\), 

3. V tests if received t is a valid value, or not. If t is a valid value, V con- 
tinues verification. 

4. V—>S: c = zyP” mod p, where I and m which are random numbers in 

, and are selected by V, 

5. S—>V: d = mod p, where k = a * mod q, 

6. V accepts y as a valid signature, if and only ifd= rpccT (mod p). 

Fig. 3. Description of verification protocol 

In Fig. 3, /is an one-way function which is known to S and V. For authentication, we 
add procedure I to 3, to verification protocol of Chaum-van Antwerpen undeniable 
signature scheme. We use two values, t for validation of S, and z for checking S’ co- 
operation over his signature. 

Disavowal step: This step is essential to undeniable signature scheme, because V can 
settle S’ denial of his valid signature by this step, when S repudiates his signature. We 
also add authentic feature to this step, and full description of this step is shown in Fig. 
4: 

1. V^S: r„ where g G is random number, timer or nonce, which is cho- 
sen by V, 

2. S^V: z, where z = (/•„)“ mod p, t =f[r„ REG) mod (p-l), 

3. V tests if received t is a valid value, or not. If t is a valid value, V contin- 
ues. 

4. I/—^S- C = mod p , where e\ and C 2 which are random numbers 

in , and are selected by V, 

5. S—>V: d= mod p, where k = mod q, 

6. V verifies that d is not congruent to (mod p). 
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7. C = mod p, where /i and /2 which are random num- 

bers in , and are selected by V, 

8. S—>V: D= C* mod p, where k = mod q, 

f f 

9. V verifies that D is not congruent to r^x OC mod p, 

10. V concludes that y is a forgery if and only if 

Fig. 4. Description of disavowal protocol 

In Fig. 4, /is an one-way function which is known to S and V, like in verification pro- 
tocol. In Fig. 4, if one of the tests at stage 6 or 9 fails, V can accept y as S’ valid sig- 
nature. And V can regard 5 as a liar, or believe that S attempt to deny his valid signa- 
ture, too. Moreover, if the test at stage 10 fails, then S must have used two different 
value of a at stage 6 and stage 9, or not followed above process properly. 



4 Security Analysis 

Our scheme must be able to prevent forgeries of S’ signature, and, deal with and re- 
solve S’ attempt of denial of his valid signature. Central to the above problems are 
related to compromise or disclosure of S’ secret information, i.e. secret key or signing 
key, a. That is, if S’ secret information is compromised, our scheme can’t be valid or 
secure any more. However, our scheme is based on the discrete logarithm problem, or 
computing discrete logarithms over finite fields is very difficult and complex. So, it is 
very difficult for ADV to compute a, or S’ secret information, from the equation y = x" 
mod p[4]. 

However, no matter what ADV can’t make valid S’ signature, our scheme is vulner- 
able to accepting a wrong signature as a valid one, in itself. That is, V may accept y' as 
a valid signature for message x with very small probability, where / x mod p. Same 
problem is also in Chaum-van Antwerpen undeniable signature scheme, and the prob- 
ability is l/^[2]. 

In our scheme, signing algorithm is not different from signing algorithm of Chaum- 
van Antwerpen undeniable signature scheme. Accordingly, our scheme might still 
regard an invalid signature as a valid one. S can deny or repudiate his signature, be- 
cause an invalid signature might be accepted as valid, and a signature which is verified 
successfully, might be a forgery. A detailed description about the probability of ac- 
cepting a fraudulent signature as valid, is given in Fig. 5. 
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First, each possible challenge c in verification and disavowal step, corresponds to 
exactly q ordered pairs (r^,, ej, 62). This is because y and (3 are both elements of the 
multiplicative group G of prime order q. 

Second, S receives the challenge c, he has no way of knowing which of the q pos- 
sible ordered pairs (r„, ei, €2), which is used to construct challenge c. 

Third, let y is not congruent to x‘‘ (mod p). Then any possible response de G that S 
might make is consistent with exactly one of the q possible ordered pairs {r„ ej, 
62)- 



Since a generates G, any element of G as a power of a, where the exponent is 
defined uniquely modulo q. So, write c = ci,d=ci,x=ct,y = ci,ry= d , where i, 
j, k, I, s e Zq and all arithmetic is modulo p. Consider the following two congru- 
ences: 

c = r^“y‘'' j3‘"^modp) 
d = r^ (mod p) 



This system is equivalent to the following system: 
i = as + le\ + ac2 (mod q) 
j=s + kei + 62 (mod q) 



This system can be represented as below: 



(i\ 






I 

k 



V 



A 



A 



(s + e,) 



(mod q) 






Hence, the coefficient matrix of above system of congruences modulo q has 
nonzero determinant, and thus there is a unique solution to the system. That is, 
every J e G is the correct response for exactly one of the q possible ordered pairs 
(ei, (s + 62))- Accordingly, the probability that S gives V a response d that will be 
verified exactly 1/q. 



Fig. 5. The probability of accepting a fraudulent signature as valid 



As described in Fig. 5, the probability of accepting a fraudulent signature as valid is 
not improved, on the assumption that (s + e^) is regarded as one variable. However, 
due to added authentication, the value of s is needed prior to verification protocol or 
disavowal protocol, and getting directly that s value is infeasible because it depends on 
solving discrete logarithm problem. Of course, ADV might get the value of (s + e^), 
however the complexity of deciding each value is dependent on the size of q. In Fig. 5, 
the sum of s and is a constant which is less than q, and there can be so many pairs of 
such s and Accordingly, we can say that our scheme can be more secure and reli- 
able than Chaum-van Antwerpen undeniable signature scheme not computationally 
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but logically. In other words, the difference which make our scheme more secure and 
reliable, depends on the pair of three challenges, s, and e^. These challenges are used 
only once, and changed whenever verification protocol or disavowal protocol is per- 
formed, accordingly we can say that our scheme can be more reliable than Chaum-van 
Antwerpen undeniable signature scheme, without loss of generality. 

In our scheme, two responses z and t are used for authenticating S. Because t is made 
from / and and REG, calculating a right t is impossible without knowing of / and 
REG. If ADV succeeds in getting t used earlier and knows /, he can hardly get REG 
due to irreversibility of one-way function/. And, z is used to assure S knows a, his 
signing key or private key, which is used to sign. ADV can’t get S’ signing key from z, 
because of the computational complexity of discrete logarithm problem. 

Moreover, S’ smart card is tamper-resistant, ADV can’t get REG or other information 
from S’ smart card, in improper way. In other words, if S can pass verification proto- 
col, we can say that S must use his valid smart card, know his secret information or 
signing key and be a registered user of V” s system. Accordingly, S who passes verifi- 
cation protocol or disavowal protocol, can hardly deny or repudiate his valid signature. 



5 Conclusion 

This paper has presented a new undeniable signature scheme which is based on 
Chaum-van Antwerpen undeniable signature scheme, using smart cards. On the 
whole, our scheme relies on the difficulty of computing discrete logarithm problems 
over finite fields, three random challenges, and tamper-resistance of smart cards. 

There are two types of data which play a sensitive and crucial role, in our scheme. The 
one is the signer’s signing key which is used whenever his signature is made. If this 
signer’s signing key is compromised, our scheme can’t be any more secure or reliable. 
However, adversaries can hardly get signer’s signing key from signer’s valid signa- 
ture, because calculating signer’s signing key depends on computational infeasibility 
of discrete logarithm problem. Accordingly, as far as solving the discrete logarithm 
problem is infeasible, adversaries can’t get signer’s signing key and our scheme can be 
secure. 

The other is three random numbers which are used to make challenges. In Chaum-van 
Antwerpen undeniable signature scheme, two random numbers are used to lead the 
signer to cooperation on verification of his signature. However, with a very small 
probability, a wrong signature might be accepted as a valid one. So, we used one more 
random number to lower the probability of mistakes. This additional random number 
is used to authenticate the signer of signature, check if the signer is a legitimate user of 
verifier’s system or not. Moreover, if those random numbers are generated using time- 
stamp, then our scheme can be more effective and withstand replaying attacks. 

By added registration feature and random number for challenge, authentication is 
possible in our scheme. Due to authentication and adopting one more challenge, we 
can say that the probability of mistake that accepting a wrong signature as a valid one, 
is lower than Chaum-van Antwerpen undeniable signature scheme, without loss of 
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generality. Moreover, because our scheme introduced registration step, our scheme 
can he used in user authentication scheme with strict logging feature wherever trans- 
action is occurred, and so on. If signer’s private key or signing key is also stored on 
his smart card, and using biometrics like fingerprint to access that key, then the prob- 
ability of repudiation over his valid signature can be lowered or removed. 
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Abstract. The results of development and research of non-binary block 
inseparable codes such as mBnq (m > n, q > 2), combining error detection and 
correction with increase of an information rate are given. The code words are 
represented by points (vectors) in n-dimensional discrete space with the 
Minkowski metric (modular metric), appropriate to unsymmetrical multilevel 
channels. The estimation of the lower and upper bounds of correcting mBnq 
codes are given. The algorithm of quasi-compact packing of the code words 
with a given code distance in n-dimensional q-ary space with the Minkowski 
metric is offered. The block synchronization method on unallowed 
combinations of q-ary digits appearing on borders of the code words is 
considered. This method does not reduce an information rate. The estimation of 
probability of an error in Gaussian channels at usage of 5-ary and 7-ary 
correcting codes is given. The connection between Hadamard codes and mBnq 
codes is fixed. 



1 Fundamental Concepts and Definitions 



Let us use the following designations and definitions: 

N = q" - number of possible n-digit words of the q-ary alphabet; 

M = 2“ - number of m-bit binary words - power of mBnq code. 

Fb - clock rate of a binary signal; 

Fq - clock rate of a q-ary signal; 

p = m/n = Fb/Fq - coefficient of variation of a specific information rate; at p > 1 code 
is named frequency-compact; 

t - multiplicity of errors - number of errors in the code word; 

h - order of an error - number of discrete levels, on which varies a single code pulse; 

j._ l°g2g I 

P 



- information redundancy of mBnq code. 

If in the initial binary sequence the digits 0 and 1 are uncorrelated and are 
equiprobable, then p - information quantity, contained in one q-ary digit. 

The words selected from a set N for mBnq mapping are named as allowed; the 
remaining words are named as unallowed. Then 



M ~ 2"* 



( 1 ) 



- number of unallowed words of a mBnq-code come on one allowed word. 
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Let us name value [i as redundancy of mBnq conversion. Let us enter the concept 
of a prime mBnq code. 

Definition 1. Prime is named mpBnpq code, for which at given information 
redundancy r the length of the code words (blocks) is minimum. 

For example, the code 4B3T is prime for q = 3, p = 1,33 and r = 0,19; the code 
2B1QI - prime for q = 5, p = 2 and r = 0,16. The prime codes are, obviously, have 
minimum redundancy of conversion p. 

It is follows from (1), that the value m of codes cmpBcUpp is increased with 
enlarging c, where c - natural number, i.e. the number of unallowed code words come 
on one allowed word is increased. Therefore it is true the following. 

Statement 1. If there is a prime code with p > 1, r > 0, then at c major enough 
there is a frequency-compact correcting code cmpBcnpq. 



2 Metrics of Code Space 

At choice of code space metric of mBnq codes it is necessary to take into account, 
that for q-ary signals the transmission channel is unsymmetrical, as the error 
probability of the high orders per (h > 1) is much less, than error probability of the first 
order per (h = 1). To this condition there corresponds the Minkowski metric (modular 
metric). 

At h = 1 distances in metrics of Minkowski and Hamming are coincide. 



3 Estimation of a Length of mBnq Codes. Some mBnq Codes 



Estimation of a length n of mBnq code, which at given values q and p ensures 
correction, in Minkowski metric, of all errors with multiplicity up to t inclusively, 
order h= 1 , looks like 



2n(loq2q-p) 



2t-l . 

> Y C‘ 
Si 



( 2 ) 



where LxJ- whole part of number x. 
Att=l 



2n{loq2q-p) > . ( 3 ) 

Under the formulas (2) and (3) the tables are calculated, in which the parameters 
of some mBnq codes are listed. 

Table 1. Ternary codes (q = 3, t = 1) 



Prime code 


P 


r 




Code 


IBIT 


1,10 


0,68 


6 


6B6T 


5B4T 


1,25 


0,27 


16 


20B16T 


4B3T 


1,33 


0,19 


21 


28B21T 


3B2T 


1,50 


0,05 


84 


126B84T 
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Table 2. Quintary codes (q = 5) 



Prime 


P 


r 


t= 1 


t = 2 1 


code 






Hb 


code 


Hb 


code 


4B3QI 


1,33 


0,74 


3 


4B3QI 


- 


- 


3B2QI 


1,50 


0,55 


4 


6B4QI 


16 


24B16QI 


7B4QI 


1,75 


0,33 


8 


14B8QI 


28 


49B28QI 


2B1QI 


2,00 


0,16 


17 


34B17QI 


59 


118B59QI 



Table 3. Septimary codes (q = 7) 



Prime 


P 


r 


t= 1 


t = 2 1 


code 






n 


code 


n 


code 


2Bl(q = 7) 


2,0 


0,40 


5 


10B5(q = 7) 


10 


20B10(q = 7) 


5B2(q = 7) 


2,5 


0,24 


20 


50B20(q = 7) 







Table 4. Hexadecimal codes (q = 16) 



Prime 


P 


r 


t= 1 


t = 2 


code 






n 


code 


n 


code 


5B2(q = 16) 


2,5 


0,60 


2 


3B2(q = 16) 






3Bl(q= 16) 


3,0 


0,33 


4 


12B4(q = 16) 


14 


42B14(q= 16) 


7B2(q = 16) 


3,5 


0,14 


12 


42B12(q= 16) 







Note, that (2) gives an estimation of an upper bound of a length of the code 
words of mBnq codes. At a density packing of the code words in code space mBnq 
codes at given q and p can appear more shortly, than calculated on the formula (2). 

As an example we shall give one of possible matrixes of the code words of a code 
4B3QI (q = 5). 



4532/ = 



040 

220 

021 

241 

421 

201 

132 

312 

043 

223 

443 

403 

024 

244 

424 

204 



(4) 



4 Codes Generated by Generalized Hadamard Matrixes 



The class of non-binary correcting codes constructed on the basis of the mathematical 
apparatus of Generalized Hadamard Matrixes (GHM) is known. Though for these 
codes p < 1, in some cases they can be useful as they are characterized by high 
correcting ability (big code distance). 
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The following theorem defines existence and algorithm of construction of non- 
hinary inseparable correcting codes, generated by GHM. 

Theorem 1. If there is a Generalized Hadamard Matrix H(q, n), where q = p®, and 
p - prime number, and s - natural number, then exists q-ary code containing M = qn 
words with lengths n and with Hamming code distance 



q 

Let us designate these codes as Had(q, n). 

Statement 2. There are Hadamard Matrixes H(p®, p®) and codes, generated by 
them, Had(p*, p®). 

Hadamard Codes are optimum in the sense, that at given q, n and at 



q 

the power of a code M is maximum. Besides they have series of useful properties - 
they are balance, the lines and columns of the code words matrix are orthogonal. 
Unlike nonlinear mBnq codes, codes Had(q, n) are linear above a field GF(p®). 
Theorem 2. In the Minkowski metric a code distance of codes Had(p®, p®) 



AS 1 

7 P ^ 

dm= ^ atp?t2, 

4 

- 2 ^^“^) +2 at p = 2 . 



As an example we shall give the code words matrix of a code Had(5, 5): 



Had (5,5) = 



01234 

02413 

03142 

04321 

12340 

13024 

14203 

10432 

23401 

24130 

20314 

21043 

34012 

30241 

31420 

32104 

40123 

41302 

42031 

43210 



(5) 



Here d^ = 4, d„, = 6. 

Between Hadamard codes and mBnq codes exist mutual dependence, which is, 
that code matrixes of mBnq codes can be received as a result of conversion of code 
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matrixes Had(q, n). For example, crossing out in a matrix (5) four lines we shall 
receive a code 4B5QI with p = 0,8 and d„ = 6; crossing out, except that, the first 
column, we shall receive a code 4B4QI with p = 1 and d„, = 4. 

5 Block Synchronizations of Codes mBnq and Had(q,n) 

At decoding mBnq codes and Hadamard codes Had(q, n) the block synchronization is 
necessary, i.e. in the decoder the borders of the code words should be known. The 
insertion of special clock signals will reduce efficiency of codes sharply. Therefore 
for block synchronization of codes mBnq and Had(q, n) as clock signals the 
combinations of digits will be used which appears only at the turns of the near by 
code words. 

For the explanation of this method of block synchronization we shall consider 
code matrixes (4) and (5). 

The analysis of a code matrix (4) of code 4B3QI display, that the combinations of 
digits 00, 11, 33, 01, 03, 13, 41, 43 meet only at the turns of the code words. 
Therefore they can be used as clock signals of block synchronization of a code 
4B3QI. 

In a code matrix (5) combinations of digits 00, 11, 22, 33, 44 meet only at the 
turns of the code words. These combinations are clock signals of block 
synchronization of a code Had(5, 5). 



6 Recursion Algorithm of Quasi- Compact Packing 

The packing of points in spaces with the Minkowski metric is considered. 

Let us enter the following designations: 

- n(n, q) - n-dimensional q-ary space (n-dimensional cube); 

dm - minimum distance between points densely packed in FI(n, q); 

- N=qn - power of a set of all points in FI(n, q) (at dm = 1); 

- F(l, q) - 1-dimensional hyperplane in Fl(n, q) (1-dimensional cube); 

- Li(l, q, dm) - set of points densely packed in Fi(l, q) with distance dm; 

- d(Lj, L]i) - distance between point sets Lj and 

Recursion algorithm of quasi-compact packing of points in Fl(n,q) with distance 
dm represents step by step procedure. 

First step - to construct sets Li(lm, q, dm) and L2(lm, q, dm) at d(Li, L2) = Ldm/2i 
where Im - minimum dimension of a subspace F(lm, q), in which two points with 
distance dm can be placed at any rate. A set Qi = Li(lm, q, dm) U L2(lm, q, dm) we shall 
call as base structure of the first order. 

Second step - to construct sets Li(lm+i, q, dm) and L2(lm+i, q, dm), using base 
structure Qi. To construct base structure of the second order Q2 = Li(lm+i, q, dm) U 
l--2(lm+b dm). 

Third step - using Q2 to construct sets Li(lm+2, q, dm) and L2(lm+2, q, dm) and base 
structure Q3 = Li(lm+2, q, dm) U L2(lm+2, q, dm). 

Etc.; the construction is ended on a step k, at which Im+k-i = n. 
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7 Estimation of Probability of Errors Not Corrected by mBnq 
Codes 

Generally, when the mBnq code allows to correct all errors with multiplicity up to t = 
k and order up to h = s inclusively, after decoding there are remain errors with 
multiplicity k+1 < t < h, order s+1 < h < q-1, and full probability of errors Per. dec. is 
determined under the formula 



n q-l 

Per.dec. ~ X 'Lp(t = L h= j) , (6) 

i=k-\-\ j=s+l 



where p(t = i, h = j) - probability of an error with multiplicity t = i, order h = j in digi- 
tal stream at an input of the decoder. 

Let us assume, that there is the additive noise in a transmission channel, and 
signal-to-noise ratio, such, what probability of an error on a digit Per.dig. ^ 10 h = 1. 
Then expediently to apply a mBnq code error-correcting of multiplicity up to t = 2, 
order h = 1. In this case formula (6) becomes simpler and for t = 1 we obtain an 
estimation 



and for t = 2 



Per.dec. = f (t=l, h=\). 



Per.dec. = P(t=2, h=\). 



At a Gaussian noise the distribution law of error probabilities in the code word is 
binomial. Therefore 



D d -il-n J- 

rer.dec. ^^er.dig. rer.dig.' 



where k = 1 at t = 1 and k = 2 at t = 2. 

Probability of an error on a digit at h = 1 



p — 

^ er.dig. 



Izl 

q 



( 



1-0 



1 ^ dig. 



q-\ a 



where Udig. - amplitude of a maximum code pulse; 

a - meansquare voltage of a Gaussian noise; 



9 j: 

0(jc) = — ■ je ^ dz . 

y2K 0 
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In the table 5 the results of accounts of energy win AA, supplied by a code 
86B43QI (r = 2, t = 2) and code 24B12QI (r = 2, t = 1) as contrasted to by code 
2B1QI are given. 



Table 5. Energy win of quintary codes 



Per. 


AA, dB 


86B43QI 


24B12QI 


10'’ 


3,0 


2,0 


10'' 


3,5 


2,4 


lO** 


3,7 


2,5 


10-^ 


3,9 


2,6 


IQ-IU 


4,0 


2,7 
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Abstract. This paper presents a new deterministic attack against 
stream ciphers based on a nonlinear filter key-stream generator. By 
“deterministic” we mean that it avoids replacing the non-linear Boolean 
function by a probabilistic channel. The algorithm we present is based 
on a trellis, and essentially amounts to a Viterbi algorithm with a 
{0, l}-metric. The trellis is derived from the Boolean function and the 
received key-stream. The efficiency of the algorithm is comparable to 
Colic et al.’s recent “generalized inversion attack” but uses an altogether 
different approach : it brings in a novel cryptanalytic tool by calling 
upon trellis decoding. 

Keywords: Boolean functions, stream ciphers, filter generator, 

Viterbi algorithm, Fourier transform. 



1 Introduction 

We consider the binary additive stream cipher as depicted in figure 1. Several 
outputs of a binary linear feedback shift register (LFSR) generator [2] are tapped 
so as to provide the input of a Boolean function [1][3] that produces a key-stream 
sequence. The secret-key of the system is defined by the LFSR generator’s initial 
state. 

We focus on the left part of figure 1, namely the generation of the key stream : 
the LFSR produces a PN (pseudo-noise) sequence (aj)igN, and the i’th symbol 
Zi of the key-stream is obtained from (ui) by applying / as follows : 

^i-t-Ao ? * * * : ^i+An-2) 

The goal of the cryptanalyst is to recover the initial state {a^, ...aK-i) of 
the LFSR, given the first N bits of the received key-stream sequence (zi). It is 
commonly assumed that the Boolean function /, the feedback polynomial g{x) 
of the LFSR, and the connection spacings (Aq, ...Xn- 2 ) between the PN sequence 
and the nonlinear filter function are known. 



B. Honary (Ed.): Cryptography and Coding 2001, LNCS 2260, pp. 402-414, 2001. 
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Fig. 1. The cryptographic system. 



A number of cryptanalytic attacks on filter generators (and more general 
combination generators) consist of substituting a binary symmetric channel 
(BSC) [4] [5] for the Boolean function as in figure 2, and the attack is converted 
into a decoding problem. Most of the literature concerning this topic is based 
on iterative a posteriori (APP) decoding [6] [7] [8] also called probabilistic decod- 
ing [9] applied to the BSC model in order to decrypt the system. This method 
benefits from the linear parity-check equations linking input bits Oi and takes 
advantage of the input/output cross-correlation to succeed in finding the original 
key (see [10], [11], [12], [13], [14]). 

However, a Boolean function is far from being equivalent to a binary sym- 
metric channel. It is true that a balanced function / yields a symmetric be- 
haviour of the transitions from Ui to Zi, i.e. transition probabilities do not de- 
pend on particular values, namely P{zi = 0|ai = 0) = P{zi = l\ai = 1) and 
P{zi = l|oi = 0) = P{zi = 0|ai = 1), and this would tend to justify the BSC 
model : but the BSC is a Discrete Memoryless Channel [4] [5] , while the encryp- 
tion system in figure 1 has intrinsic memory induced by the connections between 
the LFSR and the Boolean function /. 
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This memory degrades the true performance of iterative decoding. Moreover, 
the knowledge of the Boolean function structure should give us much more in- 
formation on the system than iust a transition probability that characterizes a 
BSC model. 




Binary Symmetric Channel (p) 




Keystream 



Probabilistic model 
(a) 



Deterministic model 
(b) 



Fig. 2. Information theoretical model and real model. 



Therefore, as in [15], [16], [17], we choose to keep the initial deterministic 
model to take advantage of the exact Boolean function characteristics. In this 
paper, we present a hard decision algorithm that recovers the initial state of the 
LFSR. Our strategy builds upon previous work by Anderson [15] in the sense 
that we strive to recover K “independent” bits of the PN sequence (oi) which 
is sufficient to reconstruct the initial state of the LFSR through linear algebra. 
The way we accomplish this differs however : summarizing, we keep track, as 
time t varies, of the set of possible states of a sliding window of the PN sequence 
{at, at+i , . . . , Ot+m). The values of this window make up the states of a trellis 
diagram. After a surprisingly small number t of iterations a sufficient number of 
individual bits at of the PN sequence are determined and the algorithm termi- 
nates. 

The performance of this algorithm proves to be very good with a rather 
low complexity, namely 0{K)2^^\ This complexity is directly related to the 
number of inputs to the function / and their spacings (Ai), but hardly depends 
on the feedback polynomial g{x). In practice, when / is a resilient function with 
n = 8 input bits, Aj = 1, and g{X) is a feedback polynomial of degree 100, only 
about 200 bits of the key-stream (zA are needed to recover the initial state of 
the LFSR. 

The paper is organized as follows. In section 2, we briefly present two known 
deterministic attacks (Anderson [15] and Golic et al. [17]) based on an extended 
table and a tree search respectively. Then, we describe in section 3 our Viterbi- 
like attack based on the Boolean function trellis representation. A generalization 
of our attack algorithm taking into account all linear forms is given in section 4 
where we also propose two improved forward-backward versions. Finally, some 
numerical results are presented in section 5 before drawing some conclusions. 
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Notation. The following notation will be used in the sequel : 

• The vector space {0, 1}" of binary n-tuples is denoted Vn- 

• K is the degree of the LFSR connection polynomial g and N is the length 
of the key-stream sequence. 

• the original PN-sequence is denoted a = (04)^0^“^, 

• the key-stream sequence, also called received sequence in the channel decod- 
ing terminology, is denoted 2: = {zi)fLQ, 

• / denotes the n-input filtering Boolean function, 

• the spacings between the inputs of the Boolean function as illustrated in 
figure 1 are denoted (Aq, ...A„_ 2), 

• Ndec is the number of decoded bits, that is, the number of bits of the sequence 
{oi} recovered by our attack algorithm. These bits are correctly determined 
with probability one (zero error probability), 

• for any set E, the cardinality of E is denoted card{E), 

• the memory of / depends on all input bits spread in time between xi and x„. 

This memory is defined by the integer yl = 1 -|- — k — ^ ■ Thanks to 

decimation techniques the real memory of the Boolean function 1 -|- 
can be reduced to A. The notation f{x\,X2, ■ ■ ■ ,x„) is sometimes replaced 
by f{xi,X2, ■ . ■ ,xa) which means that we artificially add “fake” or “degen- 
erated” inputs to the function / so as to recover a situation where A^ = 1 
for every i. 

2 Previous Attacks Proposed by Anderson and Golic 
et al. 

We mention now two algorithms that are somehow close to our work : the first one 
uses correlations between the output and the input of an “augmented” function. 
The second one is based on the construction of trees, and looks for one possible 
initialization of the LFSR. 



2.1 Block Correlation Attack 

Anderson’s attack is block-wise oriented [15]. The key idea is to look for partic- 
ular output patterns and link them to the Boolean function input. 

Let us describe this procedure in more detail : consider the filter function / 
and take the simple case where Ai = 1 for alH = 0 . . . n — 2, so the memory is 
A = n. Anderson defines an “augmented function” [15], 

V2n-1 > Ki 

(Xi, ..X2„-l) I (/(xi,..X„),/(x2,..X„+i),../(x„,..X2„-l)) 



The dependence between an output sequence and its corresponding input is 
well represented by IF : an output bit Zn depends on n input bits, oi, ..., a„, and 
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each of these Oi will influence the value of n output Zi. Then, 2n — 1 inputs will 
Anally influence a total of n output bits, and that’s what T expresses. In the 
general case where the spacings are not equal to 1, the PN sequence influences a 
block of A bits and a total of 2^ output patterns are scanned during the attack 
(except for certain Boolean function, as mentioned in [16]). 

Anderson’s algorithm is based on the construction of the augmented table: 
for each output among the 2” possible outputs of T ^ one stores the correspond- 
ing inputs and checks whether there is a constant bit over these inputs. The 
complexity of this construction is therefore 2^"“^ when = 1. 

If all input vectors associated to a given output have a 0 (resp. 1) in the ith 
position, then the tth bit of the input vector is bound to be a 0 (resp. 1) each 
time this output is observed in the key-stream. 

Decoding enough independent bits of the input sequence enables us to invert 
the system and recover the initial state of the PN generator by linear inversion. 
When the function is built in such way that no particular output vector satisfies 
the above property (these functions exist), it will be impossible to get any 1- 
probability information on the Boolean function input, and one must either use 
a further augmented function or follow up the attack by probabilistic decoding. 



2.2 Generalized Inversion Attack 



The Generalized Inversion Attack, described in [17], is based on a finite tree 
search. According to the previous notation, for a given initial state, a new tree 
of height K — A+\ is constructed. A tree node represents A — 1 bits, the initial 
state takes 2"^“^ different values. 

The key idea of a forward attack is to expand the tree from time t to time 
t-l- 1 by looking for the solution of Zt = f{xi,X 2 , ■ ■ ■ , xa)- In the latter equation, 
Zt is known and the state X 2 ,---,xa is specified by the starting node. Such 
an equation may have no solution, a unique solution or two solutions for the 
indeterminate x\. Thus, depending on the number of solutions (0, 1 or 2), we 
can draw 0, 1 or 2 edges out of the starting node. The backward attack is similar 
and takes into account the solution relative to xa- The authors [17] suggested 
to choose exclusively between the forward and the backward attack according 
to the level of correlation relating zt to xi and xa respectively. 

The total number of trees to be examined is in the worst case. Once the 
algorithm succeeds in building a complete tree with a root of A — 1 bits and depth 
K — (A — 1) , the key-stream is recomputed and compared to the observation. If 
they are the same, the assumed initial memory state is accepted and the attack 
terminates. Using branching processes theory, Golic et al. show that the typical 
number of surviving nodes at level K — (A — 1) is not exponential, but linear in 
K, which gives a typical complexity 0{K2^) for the attack. 
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3 Trellis Representation and New Simple Deterministic 
Attack 

Our algorithm is associated to a trellis which is built in the following manner: 
a state in the trellis stands for a yl- vector input of the Boolean function /. The 
trellis graph contains 2^ states on each section. One state s is connected to two 
other states located in the next trellis section: the first one corresponds to s 
shifted once with a 0 as a new bit, and the second one corresponds to s shifted 
once with a new bit set to 1. For example, the successors of the 5-bit state 01101 
would be 00110 and 10110. The 1-bit mapping on the trellis transitions is equal 
to / applied to the successors of the current state. For the sake of simplicity, we 
assume that there is no difference of index between a and z, i.e., the received bit 
Zt corresponds to the last input at- 

The basic version of our deterministic attack algorithm is described hereafter. 
Without loss of generality, the Boolean function is taken to be balanced. 

1. t = 0 : 

Initialize JV^ec = 0. 

Exactly half of the trellis states correspond to the received bit Zq at t = 0. 

Discard all invalid states and store the survivors in a table. 

2. t>0: 

Suppose that all surviving states at time t — 1 are known. 

At time t, according to the received bit Zt, the surviving states at t — 1, and 

the mapping on the transitions, store only the new valid states matching Zt 

with their incident branch mapping. 

Check the last bit (most right) of all survivors. 

• If it is constant, equal to b, then 

- at = b 

— Increment the number of decoded bits, Ndec- 

- If Ndec < K, increment t and return to step 2. 

- If Ndec > K : it the set of decoded bits contains K independent bits, 
go to step 3. If not, return to step 2. 

• If the last bit of the survivors is not constant, increment t and go to step 

2 . 

3. Invert the linear system to recover the initial state of the LFSR. The algo- 
rithm terminates. 

The worse-case complexity of the above algorithm is in 0(2"^) : at each 
instant, one has to examine at worst 2^“^ trellis survivors and each state has 2 
out-coming transitions. This algorithm kernel is similar to a Viterbi algorithm, 
[18], [19], [20], with a hard decision {0, l}-branch metric : the major difference 
appears in the add-compare-select unit and the storage unit, we do not process 
cumulative metrics and we do not store the trellis paths associated to surviving 
states. 

To illustrate the trellis steps, let us take a simple example. We consider the 
3-input function defined by the truth-table: 
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(xi,X 2 ,X 3 ) Output 

000 0 

001 0 

010 1 

Oil 1 

100 1 

101 0 

110 1 

111 0 

This function is balanced. Its trellis graph contains 2^ = 8 states as illustrated 
in Figure 3. Let us suppose that the received key-stream sequence is 0010111, 
and that the corresponding input sequence was generated by a PN-sequence 
whose polynomial degree is K, where K is arbitrary. 

s 



3 



4 



5 



6 



7 




Fig. 3. Trellis of the Boolean function 



• t=0 : the surviving states are : 0, 1, 5, 7 and Ndec = 0. 

• t=l : the surviving states are : 0, 7. 

• t=2 : the surviving states are : 3, 4. 

• t=3: the surviving states are : 1, 5. These states are all ending with a 1-bit, 
so 03 = 1 and Ndec = 1- 

• t=4 : the surviving states are : 2, 4, 6. These states are all ending with a 
0-bit, so 04 = 0 and Ndec = 2. 

• t=5 : the surviving states are : 2, 3, 6. 
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• t=6 : the surviving states are : 3. Therefore oe = 1 and in this particular 
case, we also have 05 = 1, and Ndec = 4. 

We keep applying this procedure until we determine K independent bits to 
recover the initial LFSR state. 

Comment. Our Viterbi-like attack is related to Anderson’s and Golic et al.’s 
work in the sense that they do not call upon any BSC or otherwise probabilistic 
modelling. We use the knowledge of the Boolean function, the received key- 
stream sequence, and the memory of the system to get probability-1 information 
on the input bits a^. In this respect our algorithm relates to Anderson’s approach, 
though we use a “convolutional” procedure, as opposed to a “block” strategy: 
surprisingly, this enables us to significantly shorten the length of the needed 
observation, as our numerical results will show, while taking the square root of 
the memory requirements. 



4 Improved and Generalized Deterministic Attacks 

4.1 A First Simple Improvement : Forward-Backward Version 

An improvement of the generalized algorithm is to apply it twice on the same 
length of observation : once in a forward direction, once in a backward direction. 
The backward algorithm is a new decoding, with a new trellis (a backward trel- 
lis). Therefore, one can expect to get more relationships with the same amount 
of observation. 

This new attack is not a usual “forward and backward” attack: the “forward 
backward” algorithm developed by [21] uses the information coming from both 
the forward and the backward trellis run-through to give a probability on one 
bit in the trellis, while our forward and backward attacks are completely inde- 
pendent. As regards Colic’s forward or backward attack, they are only related 
to the properties of the Boolean function, and they are not done both in the 
same decoding step. 

In our case, the backward attack can be initialized with the survivors re- 
maining at the end of the forward attack, but we gain very little ; moreover, 
we observed that in most cases, we don’t gain much in doing several forward- 
backward attacks, because most of the time, the sets of survivors at the second 
iteration almost coincide with the ones found at the first iteration. Therefore, 
the attack in the second iteration soon becomes strictly identical to the first one, 
and we don’t decode anything more. 



4.2 Generalization 

Instead of checking only the last bit of the surviving states, one can look for linear 
combinations between the bits of the survivors. Indeed, each constant linear 
combination between the bits of the surviving states lead to a linear relation 
satisfied by terms of the sequence {oj} and thus improves the number of decoded 
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bits. Without loss of generality, we assume in the sequel that the spacings are 
regular, that is = n, which simplifies the notation. For our purposes, we define 
the Boolean function (<? for short when t does not need to be specified) by 

Vn { 0 , 1 } 

(pt '■ f 1 if X survives at time t 

X I — y \ 

\ 0 otherwise 

We will call upon Fourier transform techniques and to this end need some 
more notation : 

The scalar product between two binary vectors x and y is defined on V„, x V„, 
by: 



{x, y) I — x.y = ^ x^y^ mod 2 

i=l 

The Walsh transform / [3] of any real- valued function / : Vn — >■ M is defined 
by: 



/ : 






xev„ 



The Hamming weight of a vector equals the number of its nonzero compo- 
nents: 



w : Vn — >■ N 

n 

U I >■ w{u) = Ui 

i=l 

A balanced Boolean function is said to be m-resilient, if for any nonzero 
vector u G Vn such that w{u) < m, we have f{u) = 0 

A Boolean function / is said to be p-degenerated if there exist a Boolean 
function g : {0, 1}^ — >■ {0, 1} and a (p, n)-matrix A, such that : f = g o A. 

The following lemma characterizes linear combinations of bits that are con- 
stant on the support Supp{P) = {x / <P{x) yf 0}, of P. 

Lemma 1. If |'?(u)| = ^(0), then 

A„: K,^{0,1} 

X I — > u.x (mod 2) 

is constant over Supp{<P) 

Proof. By definition, <?(u) = = Sxey„ ^(^)- 

Let 

Aq = {x GVn ! Xu{x) = 0} 

Ai = {x GVn / \u{x) = 1}. 
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We have therefore 

Aq Ai 

= \card {Aq fl Supp{<P)) — card {A\ fl Supp{<P)) \ 

< card {Supp{d>)) 

and the equality holds if and only if Aq or Ai equals the empty set. I 

Then, the second step of the algorithm is now : 

At time t, according to the received bit Zt, the surviving states at t — 1, 
and the mapping on the transitions, store only the new valid states matching Zt 
with their incident branch mapping. Evaluate d>t defined above, and its Walsh 
transform. 

• If there is a value of u such that |^t(u)| = d>t{0) then store u as a new 
relationship between the bits of the survivors. 

- If the number of linear relationships is inferior to AT, increment t and 
return to step 2. 

- Else, if one can find K independent relationships go to step 3. Else, go 
back to step 2. 

• If not increment t and go back to step 2. 

Although this generalization increases the complexity of the algorithm be- 
cause it requires the computation of which has a complexity of 0(n2”), it 
allows us to decrypt the system with a smaller amount of observation. In the 
basic version, we only checked the last bit of the survivors, because the states at 
time t are shifted copies of the states obtained before, with a new bit. For the 
same purpose, in this new version one should only pay attention to u of the form 
u=(****l), to avoid counting several times the same relationships at differ- 
ent instants. The basic version appears as a particular case of the generalized 
algorithm, in which one evaluates only <?t(0...01). 




4.3 Second Improvement 

This modification takes into account the previous remark : in section 4.1, the 
forward and the backward attacks are almost independent (almost because the 
backward starting point we choose can depend on the forward attack). 

As in [21], let us combine both attacks to get more information at an instant 
t : we denote by Suppf^d resp. Supph^d support of at time t 

in the forward, resp. the backward description of the trellis. One runs through 
the trellis in a forward and in a backward way and memorizes Suppf^d 

and Suppbwd 

Then, we evaluate <Pt on Suppf^d (d>t^ fl Suppbwd ■ 

The intersection of the support is included in each support ; therefore we can 
expect to get less survivors, and consequently more linear relationships. 
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5 Numerical Results 

We have implemented our algorithms with a filtered LFSR defined by the poly- 
nomial g{x) = 1-1- + X^ + K = 100, and some initializations of 

the register ; the choice of the initialization hardly affects the performance of the 
algorithm which mostly relates to the boolean function /. Note that the weight 
of the polynomial doesn’t have any influence either. 

The following tables contain the minimum required amount of observation 
that enable one to get the initialization of the LFSR back. We present some 
typical results we obtained with resilient functions. These functions have good 
cryptographic properties and are widely used. 

Our first example is with the 2-resilient 5-input function, used by Anderson 
in his article, 

f{x) = X\+ X2+ {xi + X 3 ){X 2 -I- X4 -I- X5) -|- {x\ + Xa){x 2 + Xz)x^ 
and Vf, Ai = 1. 

The last two columns give the number of bits required to recover unambigu- 
ously the initial state of the LFSR. 





Forward 


Forward-B ackward 


basic algorithm 


465 


201 


generalized algorithm 


288 


185 


2nd improved generalized algorithm 


- 


185 



Our second example is with the 2-resilient 8-input function : 

/(x) = Xtl«4 + X5 + X6 + X7 + Xl(X2 + X 7 ) + X2 (X6 + X 7 ) -f X3(X6 -f Xs) 
HXiX2(X4 -f X6 + Xs) -t XlXs(X2 -f Xe) -f XlX2Xs(X4 + + Xg) 

and Vf, Ai = 1. 





Forward 


Forward-B ackward 


basic algorithm 


1044 


337 


generalized algorithm 


899 


268 


2nd improved generalized algorithm 


- 


141 



Finally, we used different Ai’s, with the same 5-input 2-resilient function as in 
our first example. The first column gives the value of (Aq, Ai, A2, A3). The results 
are given when applying the generalized algorithm, that takes into account all 
possible linear forms applied to survivors. 



(Ao, Al, A 2 , A 3 ) 


Forward 


Forward-Backward 


2nd Improved Forward-Backward 


(2, 3, 1, 2) 


299 


221 


185 


(3, 1, 2, 2) 


369 


270 


184 


(1, 3, 1, 3) 


215 


182 


157 
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6 Conclusions 

We derive a new deterministic algorithm that enables one to recover the LFSR 
initial state ; our algorithm is based on the construction of a trellis graph, which is 
built according to the Boolean function, the connections between the LFSR and 
the function, and the received key-stream. We gave a generalization of the basic 
algorithm that slightly increases the computational complexity, but shortens the 
amount of observed bits of the key-stream sequence. The two modifications of 
the algorithm rely on a “double” use of the received key-stream, first in a forward 
way, next in a backward way. They can be done either on the basic version or 
on the generalized version, and they prove to be efficient in both cases. 

The results turn out to be very good, indeed, we generally need very few 
key-stream bits (of the order of K) and therefore little time to recover the LFSR 
initial state. The typical complexity of the attack seems therefore to be 0{K2^), 
i.e. very much comparable to the typical complexity of the generalized inversion 
attack of Golic et al. Compared to the latter, our algorithm has the disadvantage 
of needing a slightly longer key-stream : however, that it determines individual 
bits of the PN sequence (oi) irrespective of their linear structure is structurally 
simpler, and arguably may be seen as an advantage, since other stream ciphers 
might try feeding a more complicated type of sequence (oi) to the boolean func- 
tion /. Furthermore, we hope that our approach will lead to using other, more 
involved trellises in cryptanalysis. 
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